Class DirectoryTaxonomyReader
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Accountable
TaxonomyReader
which retrieves stored taxonomy information from a Directory
.
Reading from the on-disk index on every method call is too slow, so this implementation employs caching: Some methods cache recent requests and their results, while other methods prefetch all the data into memory and then provide answers directly from in-memory tables. See the documentation of individual methods for comments on their performance.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.facet.taxonomy.TaxonomyReader
TaxonomyReader.ChildrenIterator
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final long
private LRUHashMap
<Integer, FacetLabel> private static final int
private final DirectoryReader
private LRUHashMap
<FacetLabel, Integer> private TaxonomyIndexArrays
private final long
private final DirectoryTaxonomyWriter
Fields inherited from class org.apache.lucene.facet.taxonomy.TaxonomyReader
INVALID_ORDINAL, ROOT_ORDINAL
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
Constructor Summary
ConstructorsConstructorDescriptionDirectoryTaxonomyReader
(DirectoryTaxonomyWriter taxoWriter) Opens aDirectoryTaxonomyReader
over the givenDirectoryTaxonomyWriter
(for NRT).DirectoryTaxonomyReader
(DirectoryReader indexReader, DirectoryTaxonomyWriter taxoWriter, LRUHashMap<FacetLabel, Integer> ordinalCache, LRUHashMap<Integer, FacetLabel> categoryCache, TaxonomyIndexArrays taxoArrays) Expert: Use this method to explicitly force theDirectoryTaxonomyReader
to use specific parent/children arrays and caches.DirectoryTaxonomyReader
(Directory directory) Open for reading a taxonomy stored in a givenDirectory
. -
Method Summary
Modifier and TypeMethodDescriptionprivate void
checkOrdinalBounds
(int... ordinals) Checks if the ordinals in the array are >=0 and invalid input: '<'DirectoryTaxonomyReader#indexReader.maxDoc()
protected void
doClose()
performs the actual task of closing the resources that are used by the taxonomy reader.protected DirectoryTaxonomyReader
Implements the opening of a newDirectoryTaxonomyReader
instance if the taxonomy has changed.int[]
getBulkOrdinals
(FacetLabel... categoryPaths) Returns the ordinals of the categories given as a path.getBulkPath
(int... ordinals) Returns an array of FacetLabels for a given array of ordinals.Returns nested resources of this class.Retrieve user committed data.Expert: returns the underlyingDirectoryReader
instance that is used by thisTaxonomyReader
.int
getOrdinal
(FacetLabel cp) Returns the ordinal of the category given as a path.Returns aParallelTaxonomyArrays
object which can be used to efficiently traverse the taxonomy tree.getPath
(int ordinal) Returns the path name of the category with the given ordinal.private FacetLabel[]
getPathFromCache
(int... ordinals) int
getSize()
Returns the number of categories in the taxonomy.protected DirectoryReader
openIndexReader
(IndexWriter writer) Open theDirectoryReader
from thisIndexWriter
.protected DirectoryReader
openIndexReader
(Directory directory) Open theDirectoryReader
from thisDirectory
.long
Return the memory usage of this object in bytes.void
setCacheSize
(int size) setCacheSize controls the maximum allowed size of each of the caches used bygetPath(int)
andgetOrdinal(FacetLabel)
.toString
(int max) Returns ordinal -> label mapping, up to the provided max ordinal or number of ordinals, whichever is smaller.Methods inherited from class org.apache.lucene.facet.taxonomy.TaxonomyReader
close, decRef, ensureOpen, getChildren, getOrdinal, getRefCount, incRef, openIfChanged, tryIncRef
-
Field Details
-
DEFAULT_CACHE_VALUE
private static final int DEFAULT_CACHE_VALUE- See Also:
-
BYTES_PER_CACHE_ENTRY
private static final long BYTES_PER_CACHE_ENTRY -
taxoWriter
-
taxoEpoch
private final long taxoEpoch -
indexReader
-
ordinalCache
-
categoryCache
-
taxoArrays
-
-
Constructor Details
-
DirectoryTaxonomyReader
DirectoryTaxonomyReader(DirectoryReader indexReader, DirectoryTaxonomyWriter taxoWriter, LRUHashMap<FacetLabel, Integer> ordinalCache, LRUHashMap<Integer, throws IOExceptionFacetLabel> categoryCache, TaxonomyIndexArrays taxoArrays) Expert: Use this method to explicitly force theDirectoryTaxonomyReader
to use specific parent/children arrays and caches.Called from
doOpenIfChanged()
. If the taxonomy has been recreated, you should passnull
as the caches and parent/children arrays.- Parameters:
indexReader
- An indexReader that is opened in the desired DirectorytaxoWriter
- TheDirectoryTaxonomyWriter
from which to obtain newly added categories, in real-time.ordinalCache
- a FacetLabel to Integer ordinal mapping if it already existscategoryCache
- an ordinal to FacetLabel mapping if it already existstaxoArrays
- taxonomy arrays that store the parent, siblings, children information- Throws:
IOException
-
DirectoryTaxonomyReader
Open for reading a taxonomy stored in a givenDirectory
.- Parameters:
directory
- TheDirectory
in which the taxonomy resides.- Throws:
CorruptIndexException
- if the Taxonomy is corrupt.IOException
- if another error occurred.
-
DirectoryTaxonomyReader
Opens aDirectoryTaxonomyReader
over the givenDirectoryTaxonomyWriter
(for NRT).- Parameters:
taxoWriter
- TheDirectoryTaxonomyWriter
from which to obtain newly added categories, in real-time.- Throws:
IOException
-
-
Method Details
-
doClose
Description copied from class:TaxonomyReader
performs the actual task of closing the resources that are used by the taxonomy reader.- Specified by:
doClose
in classTaxonomyReader
- Throws:
IOException
-
doOpenIfChanged
Implements the opening of a newDirectoryTaxonomyReader
instance if the taxonomy has changed.NOTE: the returned
DirectoryTaxonomyReader
shares the ordinal and category caches with this reader. This is not expected to cause any issues, unless the two instances continue to live. The reader guarantees that the two instances cannot affect each other in terms of correctness of the caches, however if the size of the cache is changed throughsetCacheSize(int)
, it will affect both reader instances.- Specified by:
doOpenIfChanged
in classTaxonomyReader
- Throws:
IOException
- See Also:
-
openIndexReader
Open theDirectoryReader
from thisDirectory
.- Throws:
IOException
-
openIndexReader
Open theDirectoryReader
from thisIndexWriter
.- Throws:
IOException
-
getInternalIndexReader
Expert: returns the underlyingDirectoryReader
instance that is used by thisTaxonomyReader
. -
getParallelTaxonomyArrays
Description copied from class:TaxonomyReader
Returns aParallelTaxonomyArrays
object which can be used to efficiently traverse the taxonomy tree.- Specified by:
getParallelTaxonomyArrays
in classTaxonomyReader
- Throws:
IOException
-
getCommitUserData
Description copied from class:TaxonomyReader
Retrieve user committed data.- Specified by:
getCommitUserData
in classTaxonomyReader
- Throws:
IOException
- See Also:
-
getOrdinal
Description copied from class:TaxonomyReader
Returns the ordinal of the category given as a path. The ordinal is the category's serial number, an integer which starts with 0 and grows as more categories are added (note that once a category is added, it can never be deleted).- Specified by:
getOrdinal
in classTaxonomyReader
- Returns:
- the category's ordinal or
TaxonomyReader.INVALID_ORDINAL
if the category wasn't found. - Throws:
IOException
-
getBulkOrdinals
Description copied from class:TaxonomyReader
Returns the ordinals of the categories given as a path. The ordinal is the category's serial number, an integer which starts with 0 and grows as more categories are added (note that once a category is added, it can never be deleted).The implementation in
DirectoryTaxonomyReader
is generally faster than iteratively callingTaxonomyReader.getOrdinal(FacetLabel)
- Overrides:
getBulkOrdinals
in classTaxonomyReader
- Returns:
- array of the category's' ordinals or
TaxonomyReader.INVALID_ORDINAL
if the category wasn't found. - Throws:
IOException
-
getPath
Description copied from class:TaxonomyReader
Returns the path name of the category with the given ordinal.- Specified by:
getPath
in classTaxonomyReader
- Throws:
IOException
-
getPathFromCache
-
checkOrdinalBounds
Checks if the ordinals in the array are >=0 and invalid input: '<'DirectoryTaxonomyReader#indexReader.maxDoc()
- Parameters:
ordinals
- Integer array of ordinals- Throws:
IllegalArgumentException
- Throw an IllegalArgumentException if one of the ordinals is out of bounds
-
getBulkPath
Returns an array of FacetLabels for a given array of ordinals.This API is generally faster than iteratively calling
getPath(int)
over an array of ordinals. It uses thegetPath(int)
method iteratively when it detects that the index was created using StoredFields (with no performance gains) and uses DocValues based iteration when the index is based on BinaryDocValues. Lucene switched to BinaryDocValues in version 9.0- Overrides:
getBulkPath
in classTaxonomyReader
- Parameters:
ordinals
- Array of category ordinals that were added to the taxonomy index- Throws:
IOException
-
getSize
public int getSize()Description copied from class:TaxonomyReader
Returns the number of categories in the taxonomy. Note that the number of categories returned is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).- Specified by:
getSize
in classTaxonomyReader
-
ramBytesUsed
public long ramBytesUsed()Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
getChildResources
Description copied from interface:Accountable
Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- Specified by:
getChildResources
in interfaceAccountable
- See Also:
-
setCacheSize
public void setCacheSize(int size) setCacheSize controls the maximum allowed size of each of the caches used bygetPath(int)
andgetOrdinal(FacetLabel)
.Currently, if the given size is smaller than the current size of a cache, it will not shrink, and rather we be limited to its current size.
- Parameters:
size
- the new maximum cache size, in number of entries.
-
toString
Returns ordinal -> label mapping, up to the provided max ordinal or number of ordinals, whichever is smaller.
-