POSIX BRE and ERE metacharacters
| Character | BRE/ERE | Meaning in a pattern |
|---|---|---|
| \ | Both | Turn off the special meaning of the following character. Occasionally, enable aspecial meaning for the following character in BRE, such as for (…) and {…}. |
| . | Both | Match any single character except NUL. Individual programs may also disallow matching newline. |
| * | Both | Match any number (or none) of the single character that immediately precedes it. For EREs, the preceding character can instead be a regular expression. For example, since . (dot) means any character, .* means “match any number of any character.” For BREs, * is not special if it’s the first character of a regular expression. |
| ^ | Both | Match the following regular expression at the beginning of the line or string. BRE: special only at the beginning of a regular expression. ERE: special everywhere. |
| $ | Both | Match the preceding regular expression at the end of the line or string. BRE: special only at the end of a regular expression. ERE: special everywhere. |
| […] | Both | Termed a bracket expression, this matches any one of the enclosed characters. A hyphen (-) indicates a range of consecutive characters. (Caution: ranges are locale-sensitive, and thus not portable.) A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally). Bracket expressions may contain collating symbols, equivalence classes, and character classes (described shortly). |
| \{n,m\} | BRE | Termed an interval expression, this matches a range of occurrences of the single character that immediately precedes it. {n} matches exactly n occurrences, {n,} matches at least n occurrences, and {n,m} matches any number of occurrences between n andm. n andm must be between 0 and RE_DUP_MAX (minimum value: 255), inclusive. |
| \(…\) | BRE | Save the pattern enclosed between \( and \) in a special holding space. Up to nine subpatterns can be saved on a single pattern. The text matched by the subpatterns can be reused later in the same pattern, by the escape sequences \1 to \9. For example, \(ab\).*\1 matches two occurrences of ab, with any number of characters in between. |
| \n | BRE | Replay the nth subpattern enclosed in \( and \) into the pattern at this point. n is a number from 1 to 9, with 1 starting on the left. |
| {n,m} | ERE | Just like the BRE \{n,m\} earlier, but without the backslashes in front of the braces. |
| + | ERE | Match one or more instances of the preceding regular expression. |
| ? | ERE | Match zero or one instances of the preceding regular expression. |
| | | ERE | Match the regular expression specified before or after. |
| (…) | ERE | Apply a match to the enclosed group of regular expressions. |
Additional POSIX bracket expressions
Character classes
A POSIX character class consists of keywords bracketed by [: and :]. The keywords
describe different classes of characters such as alphabetic characters,control
characters, and so on. See Table 3-3.
Collating symbols
A collating symbol is a multicharacter sequence that should be treated as a unit.
It consists of the characters bracketed by [. and .]. Collating symbols are specific
to the locale in which they are used.
Equivalence classes
An equivalence class lists a set of characters that should be considered equivalent,
such as e and è. It consists of a named element from the locale,bracketed
by [= and =].
For example, [[:alpha:]!] matches any single alphabetic character or the exclamation mark,and [[.ch.]] matches the collating element ch,but does not match just the letter c or the letter h. In a French locale, [[=e=]] might match any of e, è, ë, ê,or é.
POSIX character classes
| Class | Matching characters | Class | Matching characters |
|---|---|---|---|
| [:alnum:] | Alphanumeric characters | [:lower:] | Lowercase characters |
| [:alpha:] | Alphabetic characters | [:print:] | Printable characters |
| [:blank:] | Space and tab characters | [:punct:] | Punctuation characters |
| [:cntrl:] | Control characters | [:space:] | Whitespace characters |
| [:digit:] | Numeric characters | [:upper:] | Uppercase characters |
| [:graph:] | Nonspace characters | [:xdigit:] | Hexadecimal digits |
operator precedence
BRE operator precedence from highest to lowest Operator Meaning
| Operator | Meaning |
|---|---|
| [..] [==] [::] | Bracket symbols for character collation |
| \metacharacter | Escaped metacharacters |
| [] | Bracket expressions |
| \(\) \digit | Subexpressions and backreferences |
| * \{\} | Repetition of the preceding single-character regular expression |
| no symbol | Concatenation |
| ^ $ | Anchors |
ERE operator precedence from highest to lowest
| Operator | Meaning |
|---|---|
| [..] [==] [::] | Bracket symbols for character collation |
| \metacharacter | Escaped metacharacters |
| [] | Bracket expressions |
| () | Grouping |
| * + ? {} | Repetition of the preceding regular expression |
| no symbol | Concatenation |
| ^ $ | Anchors |
| | | Alternation |
Regular Expression Extensions
Additional GNU regular expression operators
| Operator | Meaning |
|---|---|
| \w | Matches any word-constituent character. Equivalent to [[:alnum:]_]. |
| \W | Matches any nonword-constituent character. Equivalent to [^[:alnum:]_]. |
| \< \> | Matches the beginning and end of a word, as described previously. |
| \b | Matches the null string found at either the beginning or the end of a word. This is a generalization of the < and > operators. Note: Because awk uses \b to represent the backspace character, GNU awk (gawk) uses \y. |
| \B | Matches the null string between two word-constituent characters. |
| \’ \` | Matches the beginning and end of an emacs buffer, respectively. GNU programs (besides emacs) generally treat these as being equivalent to ^ and $. |
