Class CharClasses

java.lang.Object
jflex.core.unicode.CharClasses

public class CharClasses extends Object
Character Classes.
Version:
JFlex 1.8.2
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private List<IntCharSet>
    the char classes
    private static final boolean
    debug flag (for char classes only)
    private static final Comparator<IntCharSet>
    for sorting disjoint IntCharSets
    static final int
    the largest character that can be used in char classes
    private int
    the largest character actually used in a specification
    the @{link UnicodeProperties} the spec scanner used
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructs a new CharClasses object.
  • Method Summary

    Modifier and Type
    Method
    Description
     
    (package private) Pair<int[],List<CMapBlock>>
    Computes a two-level table structure representing this CharClass object, where second-level blocks are shared if equal.
    Construct a (deep) copy of the the provided CharClasses object.
    void
    Dumps charclasses to the dump output stream.
    private static int[]
    Turn a list of second-level blocks into a flat array.
    getCharClass(int code)
    Retuns a copy of a single char class partition by code.
    int
    getClassCode(int codePoint)
    Returns the code of the character class the specified character belongs to.
    int[]
    getClassCodes(IntCharSet set, boolean negate)
    Returns an array that contains the character class codes of all characters in the specified set of input characters.
    Returns an array of all CharClassIntervals in this char class collection.
    int
    Returns the greatest Unicode value of the current input character set.
    int
    Returns the current number of character classes.
    Pair<int[],int[]>
    Returns a two-level table structure for this char-class object.
    void
    init(int maxCharCode, ILexScan scanner)
    Provides space for classes of characters from 0 to maxCharCode.
    boolean
    Checks the invariants of this object.
    void
    makeClass(int singleChar, boolean caseless)
    Creates a new character class for the single character singleChar.
    void
    makeClass(String str, boolean caseless)
    Creates a new character class for each character of the specified String.
    void
    makeClass(IntCharSet set, boolean caseless)
    Updates the current partition, so that the specified set of characters gets a new character class.
    void
    Brings the partitions into a canonical order such that objects that implement the same partitions but in different order become equal.
    void
    setMaxCharCode(int maxCharCode)
    Sets the largest Unicode value of the current input character set.
     
    toString(int theClass)
    Returns a string representation of one char class

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • DEBUG

      private static final boolean DEBUG
      debug flag (for char classes only)
      See Also:
    • INT_CHAR_SET_COMPARATOR

      private static final Comparator<IntCharSet> INT_CHAR_SET_COMPARATOR
      for sorting disjoint IntCharSets
    • maxChar

      public static final int maxChar
      the largest character that can be used in char classes
      See Also:
    • classes

      private List<IntCharSet> classes
      the char classes
    • maxCharUsed

      private int maxCharUsed
      the largest character actually used in a specification
    • unicodeProps

      private UnicodeProperties unicodeProps
      the @{link UnicodeProperties} the spec scanner used
  • Constructor Details

    • CharClasses

      public CharClasses()
      Constructs a new CharClasses object.

      CharClasses.init() is delayed until UnicodeProperties.init() has been called, since the max char code won't be known until then.

  • Method Details

    • init

      public void init(int maxCharCode, ILexScan scanner)
      Provides space for classes of characters from 0 to maxCharCode.

      Initially all characters are in class 0.

      Parameters:
      maxCharCode - the last character code to be considered. (127 for 7bit Lexers, 255 for 8bit Lexers and UnicodeProperties.getMaximumCodePoint() for Unicode Lexers).
      scanner - the scanner containing the UnicodeProperties instance from which caseless
    • getMaxCharCode

      public int getMaxCharCode()
      Returns the greatest Unicode value of the current input character set.
      Returns:
      unicode value.
    • setMaxCharCode

      public void setMaxCharCode(int maxCharCode)
      Sets the largest Unicode value of the current input character set.
      Parameters:
      maxCharCode - the largest character code, used for the scanner (i.e. %7bit, %8bit, %16bit etc.)
    • getNumClasses

      public int getNumClasses()
      Returns the current number of character classes.
      Returns:
      number of character classes.
    • allClasses

      public List<IntCharSet> allClasses()
      Returns:
      a deep-copy list of all char class partions.
    • makeClass

      public void makeClass(IntCharSet set, boolean caseless)
      Updates the current partition, so that the specified set of characters gets a new character class.

      Characters that are elements of set are not in the same equivalence class with characters that are not elements of set.

      Parameters:
      set - the set of characters to distinguish from the rest
      caseless - if true upper/lower/title case are considered equivalent
    • getClassCode

      public int getClassCode(int codePoint)
      Returns the code of the character class the specified character belongs to.
      Parameters:
      codePoint - code point.
      Returns:
      code of the character class.
    • getCharClass

      public IntCharSet getCharClass(int code)
      Retuns a copy of a single char class partition by code.
      Parameters:
      code - the code of the char class partition to return.
      Returns:
      a copy of the char class with the specified code.
    • dump

      public void dump()
      Dumps charclasses to the dump output stream.
    • toString

      public String toString(int theClass)
      Returns a string representation of one char class
      Parameters:
      theClass - the index of the class to
      Returns:
      a String object.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • makeClass

      public void makeClass(int singleChar, boolean caseless)
      Creates a new character class for the single character singleChar.
      Parameters:
      singleChar - character.
      caseless - if true upper/lower/title case are considered equivalent
    • makeClass

      public void makeClass(String str, boolean caseless)
      Creates a new character class for each character of the specified String.
      Parameters:
      str - the String to iterate single char class creation over.
      caseless - if true upper/lower/title case are considered equivalent
    • getClassCodes

      public int[] getClassCodes(IntCharSet set, boolean negate)
      Returns an array that contains the character class codes of all characters in the specified set of input characters.
    • invariants

      public boolean invariants()
      Checks the invariants of this object.

      All classes must be disjoint, and their union must be the entire input set.

      Returns:
      true when the invariants of this objects hold.
    • normalise

      public void normalise()
      Brings the partitions into a canonical order such that objects that implement the same partitions but in different order become equal.

      For example, [ {0}, {1} ] and [ {1}, {0} ] implement the same partition of the set {0,1} but have different content. Different order will lead to different input assignments in the NFA and DFA phases and will make otherwise equal automata look distinct.

      This is not needed for correctness, but it makes the comparison of output DFAs (e.g. in the test suite) for equivalence more robust.

    • copyOf

      public static CharClasses copyOf(CharClasses c)
      Construct a (deep) copy of the the provided CharClasses object.
      Parameters:
      c - the CharClasses to copy
      Returns:
      a deep copy of c
    • getIntervals

      public CharClassInterval[] getIntervals()
      Returns an array of all CharClassIntervals in this char class collection.

      The array is ordered by char code, i.e. result[i+1].start = result[i].end+1 Each CharClassInterval contains the number of the char class it belongs to.

      Returns:
      an array of all CharClassInterval in this char class collection.
    • computeTables

      Pair<int[],List<CMapBlock>> computeTables()
      Computes a two-level table structure representing this CharClass object, where second-level blocks are shared if equal. The hope is that this sharing happens (very) often with a large number of blocks being mapped to the same character class.
      Returns:
      a pair of a top-level table, and a list of second-level blocks for this char class object.
    • flattenBlocks

      private static int[] flattenBlocks(List<CMapBlock> blocks)
      Turn a list of second-level blocks into a flat array.
    • getTables

      public Pair<int[],int[]> getTables()
      Returns a two-level table structure for this char-class object. The char class of input x is snd[(fst[x >> BLOCK_BITS]) | (x && BLOCK_MASK))] where BLOCK_MASK = BLOCK_SIZE - 1, and the index of the first block in the top level is guaranteed to be 0 (which means the fst lookup can be skipped if x <= BLOCK_MASK).
      See Also: