regular expression

In formal language theory, a regular expression is a sequence of characters that describe a regular language.

Regular expressions can be defined inductively. Given an alphabet Σ:

  1. Ø, the empty set or empty language is a regular expression.
  2. ε which represents the language containing only the empty string is a regular expression.
  3. For each a ∈ Σ, a is a regular expression and represents the language {a}, the singleton language consisting of a single string that is that is just the character a.
  4. If r and s represent the languages R and S respectively:
    1. r + s is a regular expression representing R ∪ S (alternation)
    2. rs is a regular expression representing RS (concatenation)
    3. r* is a regular expression representing R* (Kleene star)

Kleene star has the highest priority followed by concatenation and then alternation and both concatenation and alternation are associative. This allows parenthesis to be omitted when there is no ambiguity. For example:

(ab)c = a(bc) = abc

a|(b(c*)) = a|bc*

 

« Back to Glossary Index