Regular Expression

POSIX BRE and ERE metacharacters

Character BRE/ERE Meaning in a pattern
\ Both Turn off the special meaning of the following character. Occasionally, enable aspecial meaning for the following character in BRE, such as for (…) and {…}.
. Both Match any single character except NUL. Individual programs may also disallow matching newline.
* Both Match any number (or none) of the single character that immediately precedes it. For EREs, the preceding character can instead be a regular expression. For example, since . (dot) means any character, .* means “match any number of any character.” For BREs, * is not special if it’s the first character of a regular expression.
^ Both Match the following regular expression at the beginning of the line or string. BRE: special only at the beginning of a regular expression. ERE: special everywhere.
$ Both Match the preceding regular expression at the end of the line or string. BRE: special only at the end of a regular expression. ERE: special everywhere.
[…] Both Termed a bracket expression, this matches any one of the enclosed characters. A hyphen (-) indicates a range of consecutive characters. (Caution: ranges are locale-sensitive, and thus not portable.) A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally). Bracket expressions may contain collating symbols, equivalence classes, and character classes (described shortly).
\{n,m\} BRE Termed an interval expression, this matches a range of occurrences of the single character that immediately precedes it. {n} matches exactly n occurrences, {n,} matches at least n occurrences, and {n,m} matches any number of occurrences between n andm. n andm must be between 0 and RE_DUP_MAX (minimum value: 255), inclusive.
\(…\) BRE Save the pattern enclosed between \( and \) in a special holding space. Up to nine subpatterns can be saved on a single pattern. The text matched by the subpatterns can be reused later in the same pattern, by the escape sequences \1 to \9. For example, \(ab\).*\1 matches two occurrences of ab, with any number of characters in between.
\n BRE Replay the nth subpattern enclosed in \( and \) into the pattern at this point. n is a number from 1 to 9, with 1 starting on the left.
{n,m} ERE Just like the BRE \{n,m\} earlier, but without the backslashes in front of the braces.
+ ERE Match one or more instances of the preceding regular expression.
? ERE Match zero or one instances of the preceding regular expression.
| ERE Match the regular expression specified before or after.
(…) ERE Apply a match to the enclosed group of regular expressions.

Additional POSIX bracket expressions

Character classes
A POSIX character class consists of keywords bracketed by [: and :]. The keywords describe different classes of characters such as alphabetic characters,control characters, and so on. See Table 3-3.

Collating symbols
A collating symbol is a multicharacter sequence that should be treated as a unit. It consists of the characters bracketed by [. and .]. Collating symbols are specific to the locale in which they are used.

Equivalence classes
An equivalence class lists a set of characters that should be considered equivalent, such as e and è. It consists of a named element from the locale,bracketed by [= and =].

For example, [[:alpha:]!] matches any single alphabetic character or the exclamation mark,and [[.ch.]] matches the collating element ch,but does not match just the letter c or the letter h. In a French locale, [[=e=]] might match any of e, è, ë, ê,or é.

POSIX character classes

Class Matching characters Class Matching characters
[:alnum:] Alphanumeric characters [:lower:] Lowercase characters
[:alpha:] Alphabetic characters [:print:] Printable characters
[:blank:] Space and tab characters [:punct:] Punctuation characters
[:cntrl:] Control characters [:space:] Whitespace characters
[:digit:] Numeric characters [:upper:] Uppercase characters
[:graph:] Nonspace characters [:xdigit:] Hexadecimal digits

operator precedence

BRE operator precedence from highest to lowest Operator Meaning

Operator Meaning
[..] [==] [::] Bracket symbols for character collation
\metacharacter Escaped metacharacters
[] Bracket expressions
\(\) \digit Subexpressions and backreferences
* \{\} Repetition of the preceding single-character regular expression
no symbol Concatenation
^ $ Anchors

ERE operator precedence from highest to lowest

Operator Meaning
[..] [==] [::] Bracket symbols for character collation
\metacharacter Escaped metacharacters
[] Bracket expressions
() Grouping
* + ? {} Repetition of the preceding regular expression
no symbol Concatenation
^ $ Anchors
| Alternation

Regular Expression Extensions

Additional GNU regular expression operators

Operator Meaning
\w Matches any word-constituent character. Equivalent to [[:alnum:]_].
\W Matches any nonword-constituent character. Equivalent to [^[:alnum:]_].
\< \> Matches the beginning and end of a word, as described previously.
\b Matches the null string found at either the beginning or the end of a word. This is a generalization of the < and > operators. Note: Because awk uses \b to represent the backspace character, GNU awk (gawk) uses \y.
\B Matches the null string between two word-constituent characters.
\’ \` Matches the beginning and end of an emacs buffer, respectively. GNU programs (besides emacs) generally treat these as being equivalent to ^ and $.