Class RECompiler

java.lang.Object
org.apache.regexp.RECompiler
Direct Known Subclasses:
REDebugCompiler

public class RECompiler extends Object
A regular expression compiler class. This class compiles a pattern string into a regular expression program interpretable by the RE evaluator class. The 'recompile' command line tool uses this compiler to pre-compile regular expressions for use with RE. For a description of the syntax accepted by RECompiler and what you can do with regular expressions, see the documentation for the RE matcher class.
Version:
$Id: RECompiler.java 518156 2007-03-14 14:31:26Z vgritsenko $
Author:
Jonathan Locke, Michael McCallum
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    (package private) class 
    Local, nested class for maintaining character ranges for character classes.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    (package private) int
     
    (package private) int
     
    (package private) static final int
     
    (package private) static final int
     
    (package private) static final int
     
    (package private) static final int
     
    (package private) static final int
     
    (package private) static final Hashtable
     
    (package private) int
     
    (package private) char[]
     
    (package private) int
     
    (package private) int
     
    (package private) static final int
     
    (package private) static final int
     
    (package private) static final int
     
    (package private) int
     
    (package private) String
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor.
  • Method Summary

    Modifier and Type
    Method
    Description
    (package private) int
    Absorb an atomic character string.
    (package private) void
    Match bracket {m,n} expression put results in bracket member variables
    (package private) int
    branch(int[] flags)
    Compile body of one branch of an or operator (implements concatenation)
    (package private) int
    Compile a character class
    (package private) int
    closure(int[] flags)
    Compile a possibly closured terminal
    compile(String pattern)
    Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.
    (package private) void
    emit(char c)
    Emit a single character into the program stream.
    (package private) void
    ensure(int n)
    Ensures that n more characters can fit in the program buffer.
    (package private) int
    Match an escape sequence.
    (package private) int
    expr(int[] flags)
    Compile an expression with possible parens around it.
    (package private) void
    Throws a new internal error exception
    (package private) int
    node(char opcode, int opdata)
    Adds a new node
    (package private) void
    nodeInsert(char opcode, int opdata, int insertAt)
    Inserts a node with a given opcode and opdata at insertAt.
    (package private) void
    setNextOfEnd(int node, int pointTo)
    Appends a node to the end of a node chain
    (package private) void
    Throws a new syntax error exception
    (package private) int
    terminal(int[] flags)
    Match a terminal node.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • RECompiler

      public RECompiler()
      Constructor. Creates (initially empty) storage for a regular expression program.
  • Method Details

    • ensure

      void ensure(int n)
      Ensures that n more characters can fit in the program buffer. If n more can't fit, then the size is doubled until it can.
      Parameters:
      n - Number of additional characters to ensure will fit.
    • emit

      void emit(char c)
      Emit a single character into the program stream.
      Parameters:
      c - Character to add
    • nodeInsert

      void nodeInsert(char opcode, int opdata, int insertAt)
      Inserts a node with a given opcode and opdata at insertAt. The node relative next pointer is initialized to 0.
      Parameters:
      opcode - Opcode for new node
      opdata - Opdata for new node (only the low 16 bits are currently used)
      insertAt - Index at which to insert the new node in the program
    • setNextOfEnd

      void setNextOfEnd(int node, int pointTo)
      Appends a node to the end of a node chain
      Parameters:
      node - Start of node chain to traverse
      pointTo - Node to have the tail of the chain point to
    • node

      int node(char opcode, int opdata)
      Adds a new node
      Parameters:
      opcode - Opcode for node
      opdata - Opdata for node (only the low 16 bits are currently used)
      Returns:
      Index of new node in program
    • internalError

      void internalError() throws Error
      Throws a new internal error exception
      Throws:
      Error - Thrown in the event of an internal error.
    • syntaxError

      void syntaxError(String s) throws RESyntaxException
      Throws a new syntax error exception
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • bracket

      void bracket() throws RESyntaxException
      Match bracket {m,n} expression put results in bracket member variables
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • escape

      int escape() throws RESyntaxException
      Match an escape sequence. Handles quoted chars and octal escapes as well as normal escape characters. Always advances the input stream by the right amount. This code "understands" the subtle difference between an octal escape and a backref. You can access the type of ESC_CLASS or ESC_COMPLEX or ESC_BACKREF by looking at pattern[idx - 1].
      Returns:
      ESC_* code or character if simple escape
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • characterClass

      int characterClass() throws RESyntaxException
      Compile a character class
      Returns:
      Index of class node
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • atom

      int atom() throws RESyntaxException
      Absorb an atomic character string. This method is a little tricky because it can un-include the last character of string if a closure operator follows. This is correct because *+? have higher precedence than concatentation (thus ABC* means AB(C*) and NOT (ABC)*).
      Returns:
      Index of new atom node
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • terminal

      int terminal(int[] flags) throws RESyntaxException
      Match a terminal node.
      Parameters:
      flags - Flags
      Returns:
      Index of terminal node (closeable)
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • closure

      int closure(int[] flags) throws RESyntaxException
      Compile a possibly closured terminal
      Parameters:
      flags - Flags passed by reference
      Returns:
      Index of closured node
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • branch

      int branch(int[] flags) throws RESyntaxException
      Compile body of one branch of an or operator (implements concatenation)
      Parameters:
      flags - Flags passed by reference
      Returns:
      Pointer to first node in the branch
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • expr

      int expr(int[] flags) throws RESyntaxException
      Compile an expression with possible parens around it. Paren matching is done at this level so we can tie the branch tails together.
      Parameters:
      flags - Flag value passed by reference
      Returns:
      Node index of expression in instruction array
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
    • compile

      public REProgram compile(String pattern) throws RESyntaxException
      Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.
      Parameters:
      pattern - Regular expression pattern to compile (see RECompiler class for details).
      Returns:
      A compiled regular expression program.
      Throws:
      RESyntaxException - Thrown if the regular expression has invalid syntax.
      See Also: