Regular Expression

POSIX BRE and ERE metacharacters

Character	BRE/ERE	Meaning in a pattern
\	Both	Turn off the special meaning of the following character. Occasionally, enable aspecial meaning for the following character in BRE, such as for (…) and {…}.
.	Both	Match any single character except NUL. Individual programs may also disallow matching newline.
*	Both	Match any number (or none) of the single character that immediately precedes it. For EREs, the preceding character can instead be a regular expression. For example, since . (dot) means any character, .* means “match any number of any character.” For BREs, * is not special if it’s the first character of a regular expression.
^	Both	Match the following regular expression at the beginning of the line or string. BRE: special only at the beginning of a regular expression. ERE: special everywhere.
$	Both	Match the preceding regular expression at the end of the line or string. BRE: special only at the end of a regular expression. ERE: special everywhere.
[…]	Both	Termed a bracket expression, this matches any one of the enclosed characters. A hyphen (-) indicates a range of consecutive characters. (Caution: ranges are locale-sensitive, and thus not portable.) A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally). Bracket expressions may contain collating symbols, equivalence classes, and character classes (described shortly).
\{n,m\}	BRE	Termed an interval expression, this matches a range of occurrences of the single character that immediately precedes it. {n} matches exactly n occurrences, {n,} matches at least n occurrences, and {n,m} matches any number of occurrences between n andm. n andm must be between 0 and RE_DUP_MAX (minimum value: 255), inclusive.
$…$	BRE	Save the pattern enclosed between $ and $ in a special holding space. Up to nine subpatterns can be saved on a single pattern. The text matched by the subpatterns can be reused later in the same pattern, by the escape sequences \1 to \9. For example, $ab$.*\1 matches two occurrences of ab, with any number of characters in between.
\n	BRE	Replay the nth subpattern enclosed in $ and $ into the pattern at this point. n is a number from 1 to 9, with 1 starting on the left.
{n,m}	ERE	Just like the BRE \{n,m\} earlier, but without the backslashes in front of the braces.
+	ERE	Match one or more instances of the preceding regular expression.
?	ERE	Match zero or one instances of the preceding regular expression.
\|	ERE	Match the regular expression specified before or after.
(…)	ERE	Apply a match to the enclosed group of regular expressions.

Additional POSIX bracket expressions

Character classes
A POSIX character class consists of keywords bracketed by [: and :]. The keywords describe different classes of characters such as alphabetic characters,control characters, and so on. See Table 3-3.

Collating symbols
A collating symbol is a multicharacter sequence that should be treated as a unit. It consists of the characters bracketed by [. and .]. Collating symbols are specific to the locale in which they are used.

Equivalence classes
An equivalence class lists a set of characters that should be considered equivalent, such as e and è. It consists of a named element from the locale,bracketed by [= and =].

For example, [[:alpha:]!] matches any single alphabetic character or the exclamation mark,and [[.ch.]] matches the collating element ch,but does not match just the letter c or the letter h. In a French locale, [[=e=]] might match any of e, è, ë, ê,or é.

POSIX character classes

Class	Matching characters	Class	Matching characters
[:alnum:]	Alphanumeric characters	[:lower:]	Lowercase characters
[:alpha:]	Alphabetic characters	[:print:]	Printable characters
[:blank:]	Space and tab characters	[:punct:]	Punctuation characters
[:cntrl:]	Control characters	[:space:]	Whitespace characters
[:digit:]	Numeric characters	[:upper:]	Uppercase characters
[:graph:]	Nonspace characters	[:xdigit:]	Hexadecimal digits

operator precedence

BRE operator precedence from highest to lowest Operator Meaning

Operator	Meaning
[..] [==] [::]	Bracket symbols for character collation
\metacharacter	Escaped metacharacters
[]	Bracket expressions
\digit	Subexpressions and backreferences
* \{\}	Repetition of the preceding single-character regular expression
no symbol	Concatenation
^ $	Anchors

ERE operator precedence from highest to lowest

Operator	Meaning
[..] [==] [::]	Bracket symbols for character collation
\metacharacter	Escaped metacharacters
[]	Bracket expressions
()	Grouping
* + ? {}	Repetition of the preceding regular expression
no symbol	Concatenation
^ $	Anchors
\|	Alternation

Regular Expression Extensions

Additional GNU regular expression operators

Operator	Meaning
\w	Matches any word-constituent character. Equivalent to [[:alnum:]_].
\W	Matches any nonword-constituent character. Equivalent to [^[:alnum:]_].
\< \>	Matches the beginning and end of a word, as described previously.
\b	Matches the null string found at either the beginning or the end of a word. This is a generalization of the < and > operators. Note: Because awk uses \b to represent the backspace character, GNU awk (gawk) uses \y.
\B	Matches the null string between two word-constituent characters.
\’ \`	Matches the beginning and end of an emacs buffer, respectively. GNU programs (besides emacs) generally treat these as being equivalent to ^ and $.

老王札记

不积跬步，无以至千里；不积小流，无以成江海。

Regular Expression

POSIX BRE and ERE metacharacters

Additional POSIX bracket expressions

operator precedence

Regular Expression Extensions