Class RegExp

java.lang.Object
dk.brics.automaton.RegExp

public class RegExp extends Object
Regular Expression extension to Automaton.

Regular expressions are built from the following abstract syntax:

regexp::=unionexp
|
unionexp::=interexp | unionexp(union)
|interexp
interexp::=concatexp & interexp(intersection)[OPTIONAL]
|concatexp
concatexp::=repeatexp concatexp(concatenation)
|repeatexp
repeatexp::=repeatexp ?(zero or one occurrence)
|repeatexp *(zero or more occurrences)
|repeatexp +(one or more occurrences)
|repeatexp {n}(n occurrences)
|repeatexp {n,}(n or more occurrences)
|repeatexp {n,m}(n to m occurrences, including both)
|complexp
complexp::=~ complexp(complement)[OPTIONAL]
|charclassexp
charclassexp::=[ charclasses ](character class)
|[^ charclasses ](negated character class)
|simpleexp
charclasses::=charclass charclasses
|charclass
charclass::=charexp - charexp(character range, including end-points)
|charexp
simpleexp::=charexp
|.(any single character)
|#(the empty language)[OPTIONAL]
|@(any string)[OPTIONAL]
|" <Unicode string without double-quotes> "(a string)
|( )(the empty string)
|( unionexp )(precedence override)
|< <identifier> >(named automaton)[OPTIONAL]
|<n-m>(numerical interval)[OPTIONAL]
charexp::=<Unicode character>(a single non-reserved character)
|\ <Unicode character> (a single character)

The productions marked [OPTIONAL] are only allowed if specified by the syntax flags passed to the RegExp constructor. The reserved characters used in the (enabled) syntax must be escaped with backslash (\) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is required also in character classes.) Be aware that dash (-) has a special meaning in charclass expressions. An identifier is a string not containing right angle bracket (>) or dash (-). Numerical intervals are specified by non-negative decimal integers and include both end points, and if n and m have the same number of digits, then the conforming strings must have that length (i.e. prefixed by 0's).

Author:
Anders Møller <amoeller@cs.au.dk>
  • Field Details

    • INTERSECTION

      public static final int INTERSECTION
      Syntax flag, enables intersection (&).
      See Also:
    • COMPLEMENT

      public static final int COMPLEMENT
      Syntax flag, enables complement (~).
      See Also:
    • EMPTY

      public static final int EMPTY
      Syntax flag, enables empty language (#).
      See Also:
    • ANYSTRING

      public static final int ANYSTRING
      Syntax flag, enables anystring (@).
      See Also:
    • AUTOMATON

      public static final int AUTOMATON
      Syntax flag, enables named automata (<identifier>).
      See Also:
    • INTERVAL

      public static final int INTERVAL
      Syntax flag, enables numerical intervals (<n-m>).
      See Also:
    • ALL

      public static final int ALL
      Syntax flag, enables all optional regexp syntax.
      See Also:
    • NONE

      public static final int NONE
      Syntax flag, enables no optional regexp syntax.
      See Also:
  • Constructor Details

    • RegExp

      public RegExp(String s) throws IllegalArgumentException
      Constructs new RegExp from a string. Same as RegExp(s, ALL).
      Parameters:
      s - regexp string
      Throws:
      IllegalArgumentException - if an error occured while parsing the regular expression
    • RegExp

      public RegExp(String s, int syntax_flags) throws IllegalArgumentException
      Constructs new RegExp from a string.
      Parameters:
      s - regexp string
      syntax_flags - boolean 'or' of optional syntax constructs to be enabled
      Throws:
      IllegalArgumentException - if an error occured while parsing the regular expression
  • Method Details

    • toAutomaton

      public Automaton toAutomaton()
      Constructs new Automaton from this RegExp. Same as toAutomaton(null) (empty automaton map).
    • toAutomaton

      public Automaton toAutomaton(boolean minimize)
      Constructs new Automaton from this RegExp. Same as toAutomaton(null,minimize) (empty automaton map).
    • toAutomaton

      public Automaton toAutomaton(AutomatonProvider automaton_provider) throws IllegalArgumentException
      Constructs new Automaton from this RegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.
      Parameters:
      automaton_provider - provider of automata for named identifiers
      Throws:
      IllegalArgumentException - if this regular expression uses a named identifier that is not available from the automaton provider
    • toAutomaton

      public Automaton toAutomaton(AutomatonProvider automaton_provider, boolean minimize) throws IllegalArgumentException
      Constructs new Automaton from this RegExp. The constructed automaton has no transitions to dead states.
      Parameters:
      automaton_provider - provider of automata for named identifiers
      minimize - if set, the automaton is minimized and determinized
      Throws:
      IllegalArgumentException - if this regular expression uses a named identifier that is not available from the automaton provider
    • toAutomaton

      public Automaton toAutomaton(Map<String,Automaton> automata) throws IllegalArgumentException
      Constructs new Automaton from this RegExp. The constructed automaton is minimal and deterministic and has no transitions to dead states.
      Parameters:
      automata - a map from automaton identifiers to automata (of type Automaton).
      Throws:
      IllegalArgumentException - if this regular expression uses a named identifier that does not occur in the automaton map
    • toAutomaton

      public Automaton toAutomaton(Map<String,Automaton> automata, boolean minimize) throws IllegalArgumentException
      Constructs new Automaton from this RegExp. The constructed automaton has no transitions to dead states.
      Parameters:
      automata - a map from automaton identifiers to automata (of type Automaton).
      minimize - if set, the automaton is minimized and determinized
      Throws:
      IllegalArgumentException - if this regular expression uses a named identifier that does not occur in the automaton map
    • setAllowMutate

      public boolean setAllowMutate(boolean flag)
      Sets or resets allow mutate flag. If this flag is set, then automata construction uses mutable automata, which is slightly faster but not thread safe. By default, the flag is not set.
      Parameters:
      flag - if true, the flag is set
      Returns:
      previous value of the flag
    • toString

      public String toString()
      Constructs string from parsed regular expression.
      Overrides:
      toString in class Object
    • getIdentifiers

      public Set<String> getIdentifiers()
      Returns set of automaton identifiers that occur in this regular expression.