Class CollationData

java.lang.Object
com.ibm.icu.impl.coll.CollationData

public final class CollationData extends Object
Collation data container. Immutable data created by a CollationDataBuilder, or loaded from a file, or deserialized from API-provided binary data. Includes data for the collation base (root/default), aliased if this is not the base.
  • Field Details

    • REORDER_RESERVED_BEFORE_LATIN

      static final int REORDER_RESERVED_BEFORE_LATIN
      See Also:
    • REORDER_RESERVED_AFTER_LATIN

      static final int REORDER_RESERVED_AFTER_LATIN
      See Also:
    • MAX_NUM_SPECIAL_REORDER_CODES

      static final int MAX_NUM_SPECIAL_REORDER_CODES
      See Also:
    • EMPTY_INT_ARRAY

      private static final int[] EMPTY_INT_ARRAY
    • JAMO_CE32S_LENGTH

      static final int JAMO_CE32S_LENGTH
      See Also:
    • trie

      Trie2_32 trie
      Main lookup trie.
    • ce32s

      int[] ce32s
      Array of CE32 values. At index 0 there must be CE32(U+0000) to support U+0000's special-tag for NUL-termination handling.
    • ces

      long[] ces
      Array of CE values for expansions and OFFSET_TAG.
    • contexts

      String contexts
      Array of prefix and contraction-suffix matching data.
    • base

      public CollationData base
      Base collation data, or null if this data itself is a base.
    • jamoCE32s

      int[] jamoCE32s
      Simple array of JAMO_CE32S_LENGTH=19+21+27 CE32s, one per canonical Jamo L/V/T. They are normally simple CE32s, rarely expansions. For fast handling of HANGUL_TAG.
    • nfcImpl

      public Normalizer2Impl nfcImpl
    • numericPrimary

      long numericPrimary
      The single-byte primary weight (xx000000) for numeric collation.
    • compressibleBytes

      public boolean[] compressibleBytes
      256 flags for which primary-weight lead bytes are compressible.
    • unsafeBackwardSet

      UnicodeSet unsafeBackwardSet
      Set of code points that are unsafe for starting string comparison after an identical prefix, or in backwards CE iteration.
    • fastLatinTable

      public char[] fastLatinTable
      Fast Latin table for common-Latin-text string comparisons. Data structure see class CollationFastLatin.
    • fastLatinTableHeader

      char[] fastLatinTableHeader
      Header portion of the fastLatinTable. In C++, these are one array, and the header is skipped for mapping characters. In Java, two arrays work better.
    • numScripts

      int numScripts
      Data for scripts and reordering groups. Uses include building a reordering permutation table and providing script boundaries to AlphabeticIndex.
    • scriptsIndex

      char[] scriptsIndex
      The length of scriptsIndex is numScripts+16. It maps from a UScriptCode or a special reorder code to an entry in scriptStarts. 16 special reorder codes (not all used) are mapped starting at numScripts. Up to MAX_NUM_SPECIAL_REORDER_CODES are codes for special groups like space/punct/digit. There are special codes at the end for reorder-reserved primary ranges.

      Multiple scripts may share a range and index, for example Hira invalid input: '&' Kana.

    • scriptStarts

      char[] scriptStarts
      Start primary weight (top 16 bits only) for a group/script/reserved range indexed by scriptsIndex. The first range (separators invalid input: '&' terminators) and the last range (trailing weights) are not reorderable, and no scriptsIndex entry points to them.
    • rootElements

      public long[] rootElements
      Collation elements in the root collator. Used by the CollationRootElements class. The data structure is described there. null in a tailoring.
  • Constructor Details

  • Method Details

    • getCE32

      public int getCE32(int c)
    • getCE32FromSupplementary

      int getCE32FromSupplementary(int c)
    • isDigit

      boolean isDigit(int c)
    • isUnsafeBackward

      public boolean isUnsafeBackward(int c, boolean numeric)
    • isCompressibleLeadByte

      public boolean isCompressibleLeadByte(int b)
    • isCompressiblePrimary

      public boolean isCompressiblePrimary(long p)
    • getCE32FromContexts

      int getCE32FromContexts(int index)
      Returns the CE32 from two contexts words. Access to the defaultCE32 for contraction and prefix matching.
    • getIndirectCE32

      int getIndirectCE32(int ce32)
      Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG). Requires that ce32 is special.
    • getFinalCE32

      int getFinalCE32(int ce32)
      Returns the CE32 for an indirect special CE32 (e.g., with DIGIT_TAG), if ce32 is special.
    • getCEFromOffsetCE32

      long getCEFromOffsetCE32(int c, int ce32)
      Computes a CE from c's ce32 which has the OFFSET_TAG.
    • getSingleCE

      long getSingleCE(int c)
      Returns the single CE that c maps to. Throws UnsupportedOperationException if c does not map to a single CE.
    • getFCD16

      int getFCD16(int c)
      Returns the FCD16 value for code point c. c must be >= 0.
    • getFirstPrimaryForGroup

      long getFirstPrimaryForGroup(int script)
      Returns the first primary for the script's reordering group.
      Returns:
      the primary with only the first primary lead byte of the group (not necessarily an actual root collator primary weight), or 0 if the script is unknown
    • getLastPrimaryForGroup

      public long getLastPrimaryForGroup(int script)
      Returns the last primary for the script's reordering group.
      Returns:
      the last primary of the group (not an actual root collator primary weight), or 0 if the script is unknown
    • getGroupForPrimary

      public int getGroupForPrimary(long p)
      Finds the reordering group which contains the primary weight.
      Returns:
      the first script of the group, or -1 if the weight is beyond the last group
    • getScriptIndex

      private int getScriptIndex(int script)
    • getEquivalentScripts

      public int[] getEquivalentScripts(int script)
    • makeReorderRanges

      void makeReorderRanges(int[] reorder, UVector32 ranges)
      Writes the permutation of primary-weight ranges for the given reordering of scripts and groups. The caller checks for illegal arguments and takes care of [DEFAULT] and memory allocation.

      Each list element will be a (limit, offset) pair as described for the CollationSettings.reorderRanges. The list will be empty if no ranges are reordered.

    • makeReorderRanges

      private void makeReorderRanges(int[] reorder, boolean latinMustMove, UVector32 ranges)
    • addLowScriptRange

      private int addLowScriptRange(short[] table, int index, int lowStart)
    • addHighScriptRange

      private int addHighScriptRange(short[] table, int index, int highLimit)
    • scriptCodeString

      private static String scriptCodeString(int script)