Class IndexSearcher

java.lang.Object
org.apache.lucene.search.IndexSearcher
Direct Known Subclasses:
QueryProfilerIndexSearcher, SuggestIndexSearcher

public class IndexSearcher extends Object
Implements search over a single IndexReader.

Applications usually need only call the inherited search(Query,int) method. For performance reasons, if your index is unchanging, you should share a single IndexSearcher instance across multiple searches instead of creating a new one per-search. If your index has changed and you wish to see the changes reflected in searching, you should use DirectoryReader.openIfChanged(DirectoryReader) to obtain a new reader and then create a new IndexSearcher from that. Also, for low-latency turnaround it's best to use a near-real-time reader (DirectoryReader.open(IndexWriter)). Once you have a new IndexReader, it's relatively cheap to create a new IndexSearcher from it.

NOTE: The search(org.apache.lucene.search.Query, int) and searchAfter(org.apache.lucene.search.ScoreDoc, org.apache.lucene.search.Query, int) methods are configured to only count top hits accurately up to 1,000 and may return a lower bound of the hit count if the hit count is greater than or equal to 1,000. On queries that match lots of documents, counting the number of hits may take much longer than computing the top hits so this trade-off allows to get some minimal information about the hit count without slowing down search too much. The TopDocs.scoreDocs array is always accurate however. If this behavior doesn't suit your needs, you should create collectorManagers manually with either TopScoreDocCollectorManager or TopFieldCollectorManager and call search(Query, CollectorManager).

NOTE: IndexSearcher instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexSearcher instance; use your own (non-Lucene) objects instead.

  • Field Details

    • maxClauseCount

      static int maxClauseCount
    • DEFAULT_QUERY_CACHE

      private static QueryCache DEFAULT_QUERY_CACHE
    • DEFAULT_CACHING_POLICY

      private static QueryCachingPolicy DEFAULT_CACHING_POLICY
    • queryTimeout

      private QueryTimeout queryTimeout
    • partialResult

      private volatile boolean partialResult
    • TOTAL_HITS_THRESHOLD

      private static final int TOTAL_HITS_THRESHOLD
      By default, we count hits accurately up to 1000. This makes sure that we don't spend most time on computing hit counts
      See Also:
    • MAX_DOCS_PER_SLICE

      private static final int MAX_DOCS_PER_SLICE
      Thresholds for index slice allocation logic. To change the default, extend IndexSearcher and use custom values
      See Also:
    • MAX_SEGMENTS_PER_SLICE

      private static final int MAX_SEGMENTS_PER_SLICE
      See Also:
    • reader

      final IndexReader reader
    • readerContext

      protected final IndexReaderContext readerContext
    • leafContexts

      protected final List<LeafReaderContext> leafContexts
    • leafSlices

      private volatile IndexSearcher.LeafSlice[] leafSlices
    • taskExecutor

      private final TaskExecutor taskExecutor
    • defaultSimilarity

      private static final Similarity defaultSimilarity
    • queryCache

      private QueryCache queryCache
    • queryCachingPolicy

      private QueryCachingPolicy queryCachingPolicy
    • similarity

      private Similarity similarity
      The Similarity implementation used by this searcher.
  • Constructor Details

  • Method Details

    • getDefaultSimilarity

      public static Similarity getDefaultSimilarity()
      Expert: returns a default Similarity instance. In general, this method is only called to initialize searchers and writers. User code and query implementations should respect getSimilarity().
    • getLeafContexts

      public List<LeafReaderContext> getLeafContexts()
      Expert: returns leaf contexts associated with this searcher. This is an internal method exposed for tests only.
    • getDefaultQueryCache

      public static QueryCache getDefaultQueryCache()
      Expert: Get the default QueryCache or null if the cache is disabled.
    • setDefaultQueryCache

      public static void setDefaultQueryCache(QueryCache defaultQueryCache)
      Expert: set the default QueryCache instance.
    • getDefaultQueryCachingPolicy

      public static QueryCachingPolicy getDefaultQueryCachingPolicy()
      Expert: Get the default QueryCachingPolicy.
    • setDefaultQueryCachingPolicy

      public static void setDefaultQueryCachingPolicy(QueryCachingPolicy defaultQueryCachingPolicy)
      Expert: set the default QueryCachingPolicy instance.
    • getMaxClauseCount

      public static int getMaxClauseCount()
      Return the maximum number of clauses permitted, 1024 by default. Attempts to add more than the permitted number of clauses cause IndexSearcher.TooManyClauses to be thrown.
      See Also:
    • setMaxClauseCount

      public static void setMaxClauseCount(int value)
      Set the maximum number of clauses permitted per Query. Default value is 1024.
    • setQueryCache

      public void setQueryCache(QueryCache queryCache)
      Set the QueryCache to use when scores are not needed. A value of null indicates that query matches should never be cached. This method should be called before starting using this IndexSearcher.

      NOTE: When using a query cache, queries should not be modified after they have been passed to IndexSearcher.

      See Also:
    • getQueryCache

      public QueryCache getQueryCache()
      Return the query cache of this IndexSearcher. This will be either the default query cache or the query cache that was last set through setQueryCache(QueryCache). A return value of null indicates that caching is disabled.
    • setQueryCachingPolicy

      public void setQueryCachingPolicy(QueryCachingPolicy queryCachingPolicy)
      Set the QueryCachingPolicy to use for query caching. This method should be called before starting using this IndexSearcher.
      See Also:
    • getQueryCachingPolicy

      public QueryCachingPolicy getQueryCachingPolicy()
      Return the query cache of this IndexSearcher. This will be either the default policy or the policy that was last set through setQueryCachingPolicy(QueryCachingPolicy).
    • slices

      protected IndexSearcher.LeafSlice[] slices(List<LeafReaderContext> leaves)
      Expert: Creates an array of leaf slices each holding a subset of the given leaves. Each IndexSearcher.LeafSlice is executed in a single thread. By default, segments with more than MAX_DOCS_PER_SLICE will get their own thread.

      It is possible to leverage intra-segment concurrency by splitting segments into multiple partitions. Such behaviour is not enabled by default as there is still a performance penalty for queries that require segment-level computation ahead of time, such as points/range queries. This is an implementation limitation that we expect to improve in future releases, see the corresponding github issue.

    • slices

      public static IndexSearcher.LeafSlice[] slices(List<LeafReaderContext> leaves, int maxDocsPerSlice, int maxSegmentsPerSlice, boolean allowSegmentPartitions)
      Static method to segregate LeafReaderContexts amongst multiple slices. Creates slices according to the provided max number of documents per slice and max number of segments per slice. Splits segments into partitions when the last argument is true.
      Parameters:
      leaves - the leaves to slice
      maxDocsPerSlice - the maximum number of documents in a single slice
      maxSegmentsPerSlice - the maximum number of segments in a single slice
      allowSegmentPartitions - whether segments may be split into partitions according to the provided maxDocsPerSlice argument. When true, if a segment holds more documents than the provided max docs per slice, it is split into equal size partitions that each gets its own slice assigned.
      Returns:
      the array of slices
    • getIndexReader

      public IndexReader getIndexReader()
      Return the IndexReader this searches.
    • storedFields

      public StoredFields storedFields() throws IOException
      Returns a StoredFields reader for the stored fields of this index.

      Sugar for .getIndexReader().storedFields()

      This call never returns null, even if no stored fields were indexed. The returned instance should only be used by a single thread.

      Example:

       TopDocs hits = searcher.search(query, 10);
       StoredFields storedFields = searcher.storedFields();
       for (ScoreDoc hit : hits.scoreDocs) {
         Document doc = storedFields.document(hit.doc);
       }
       
      Throws:
      IOException - If there is a low-level IO error
      See Also:
    • setSimilarity

      public void setSimilarity(Similarity similarity)
      Expert: Set the Similarity implementation used by this IndexSearcher.
    • getSimilarity

      public Similarity getSimilarity()
      Expert: Get the Similarity to use to compute scores. This returns the Similarity that has been set through setSimilarity(Similarity) or the default Similarity if none has been set explicitly.
    • count

      public int count(Query query) throws IOException
      Count how many documents match the given query. May be faster than counting number of hits by collecting all matches, as the number of hits is retrieved from the index statistics when possible.
      Throws:
      IOException
    • getSlices

      public final IndexSearcher.LeafSlice[] getSlices()
      Returns the leaf slices used for concurrent searching. Override slices(List) to customize how slices are created.
    • computeAndCacheSlices

      private IndexSearcher.LeafSlice[] computeAndCacheSlices()
    • enforceDistinctLeaves

      private static void enforceDistinctLeaves(IndexSearcher.LeafSlice leafSlice)
    • searchAfter

      public TopDocs searchAfter(ScoreDoc after, Query query, int numHits) throws IOException
      Finds the top n hits for query where all results are after a previous result (after).

      By passing the bottom result from a previous page as after, this method can be used for efficient 'deep-paging' across potentially large result sets.

      Throws:
      IndexSearcher.TooManyClauses - If a query would exceed getMaxClauseCount() clauses.
      IOException
    • getTimeout

      public QueryTimeout getTimeout()
      Get the configured QueryTimeout for all searches that run through this IndexSearcher, or null if not set.
    • setTimeout

      public void setTimeout(QueryTimeout queryTimeout)
      Set a QueryTimeout for all searches that run through this IndexSearcher.
    • search

      public TopDocs search(Query query, int n) throws IOException
      Finds the top n hits for query.
      Throws:
      IndexSearcher.TooManyClauses - If a query would exceed getMaxClauseCount() clauses.
      IOException
    • search

      @Deprecated public void search(Query query, Collector collector) throws IOException
      Deprecated.
      This method is being deprecated in favor of search(Query, CollectorManager) due to its support for concurrency in IndexSearcher
      Lower-level search API.

      LeafCollector.collect(int) is called for every matching document.

      Throws:
      IndexSearcher.TooManyClauses - If a query would exceed getMaxClauseCount() clauses.
      IOException
    • timedOut

      public boolean timedOut()
      Returns true if any search hit the timeout.
    • search

      public TopFieldDocs search(Query query, int n, Sort sort, boolean doDocScores) throws IOException
      Search implementation with arbitrary sorting, plus control over whether hit scores and max score should be computed. Finds the top n hits for query, and sorting the hits by the criteria in sort. If doDocScores is true then the score of each hit will be computed and returned. If doMaxScore is true then the maximum score over all collected hits will be computed.
      Throws:
      IndexSearcher.TooManyClauses - If a query would exceed getMaxClauseCount() clauses.
      IOException
    • search

      public TopFieldDocs search(Query query, int n, Sort sort) throws IOException
      Search implementation with arbitrary sorting.
      Parameters:
      query - The query to search for
      n - Return only the top n results
      sort - The Sort object
      Returns:
      The top docs, sorted according to the supplied Sort instance
      Throws:
      IOException - if there is a low-level I/O error
    • searchAfter

      public TopDocs searchAfter(ScoreDoc after, Query query, int n, Sort sort) throws IOException
      Finds the top n hits for query where all results are after a previous result (after).

      By passing the bottom result from a previous page as after, this method can be used for efficient 'deep-paging' across potentially large result sets.

      Throws:
      IndexSearcher.TooManyClauses - If a query would exceed getMaxClauseCount() clauses.
      IOException
    • searchAfter

      public TopFieldDocs searchAfter(ScoreDoc after, Query query, int numHits, Sort sort, boolean doDocScores) throws IOException
      Finds the top n hits for query where all results are after a previous result (after), allowing control over whether hit scores and max score should be computed.

      By passing the bottom result from a previous page as after, this method can be used for efficient 'deep-paging' across potentially large result sets. If doDocScores is true then the score of each hit will be computed and returned. If doMaxScore is true then the maximum score over all collected hits will be computed.

      Throws:
      IndexSearcher.TooManyClauses - If a query would exceed getMaxClauseCount() clauses.
      IOException
    • searchAfter

      private TopFieldDocs searchAfter(FieldDoc after, Query query, int numHits, Sort sort, boolean doDocScores) throws IOException
      Throws:
      IOException
    • search

      public <C extends Collector, T> T search(Query query, CollectorManager<C,T> collectorManager) throws IOException
      Lower-level search API. Search all leaves using the given CollectorManager. In contrast to search(Query, Collector), this method will use the searcher's Executor in order to parallelize execution of the collection on the configured getSlices().
      Throws:
      IOException
      See Also:
    • search

      private <C extends Collector, T> T search(Weight weight, CollectorManager<C,T> collectorManager, C firstCollector) throws IOException
      Throws:
      IOException
    • search

      protected void search(IndexSearcher.LeafReaderContextPartition[] partitions, Weight weight, Collector collector) throws IOException
      Lower-level search API.

      searchLeaf(LeafReaderContext, int, int, Weight, Collector) is called for every leaf partition.

      NOTE: this method executes the searches on all given leaf partitions exclusively. To search across all the searchers leaves use leafContexts.

      Parameters:
      partitions - the leaf partitions to execute the searches on
      weight - to match documents
      collector - to receive hits
      Throws:
      IndexSearcher.TooManyClauses - If a query would exceed getMaxClauseCount() clauses.
      IOException
    • searchLeaf

      protected void searchLeaf(LeafReaderContext ctx, int minDocId, int maxDocId, Weight weight, Collector collector) throws IOException
      Lower-level search API

      LeafCollector.collect(int) is called for every document.

      Parameters:
      ctx - the leaf to execute the search against
      minDocId - the lower bound of the doc id range to search
      maxDocId - the upper bound of the doc id range to search
      weight - to match document
      collector - to receive hits
      Throws:
      IndexSearcher.TooManyClauses - If a query would exceed getMaxClauseCount() clauses.
      IOException
    • rewrite

      public Query rewrite(Query original) throws IOException
      Expert: called to re-write queries into primitive queries.
      Throws:
      IndexSearcher.TooManyClauses - If a query would exceed getMaxClauseCount() clauses.
      IOException
    • rewrite

      private Query rewrite(Query original, boolean needsScores) throws IOException
      Throws:
      IOException
    • getNumClausesCheckVisitor

      private static QueryVisitor getNumClausesCheckVisitor()
      Returns a QueryVisitor which recursively checks the total number of clauses that a query and its children cumulatively have and validates that the total number does not exceed the specified limit. Throws IndexSearcher.TooManyNestedClauses if the limit is exceeded.
    • explain

      public Explanation explain(Query query, int doc) throws IOException
      Returns an Explanation that describes how doc scored against query.

      This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index.

      Throws:
      IOException
    • explain

      protected Explanation explain(Weight weight, int doc) throws IOException
      Expert: low-level implementation method Returns an Explanation that describes how doc scored against weight.

      This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index.

      Applications should call explain(Query, int).

      Throws:
      IndexSearcher.TooManyClauses - If a query would exceed getMaxClauseCount() clauses.
      IOException
    • createWeight

      public Weight createWeight(Query query, ScoreMode scoreMode, float boost) throws IOException
      Creates a Weight for the given query, potentially adding caching if possible and configured.
      Throws:
      IOException
    • getTopReaderContext

      public IndexReaderContext getTopReaderContext()
      Returns this searcher's top-level IndexReaderContext.
      See Also:
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • termStatistics

      public TermStatistics termStatistics(Term term, int docFreq, long totalTermFreq) throws IOException
      Returns TermStatistics for a term.

      This can be overridden for example, to return a term's statistics across a distributed collection.

      Parameters:
      docFreq - The document frequency of the term. It must be greater or equal to 1.
      totalTermFreq - The total term frequency.
      Returns:
      A TermStatistics (never null).
      Throws:
      IOException
    • collectionStatistics

      public CollectionStatistics collectionStatistics(String field) throws IOException
      Returns CollectionStatistics for a field, or null if the field does not exist (has no indexed terms)

      This can be overridden for example, to return a field's statistics across a distributed collection.

      Throws:
      IOException
    • getTaskExecutor

      public TaskExecutor getTaskExecutor()
      Returns the TaskExecutor that this searcher relies on to execute concurrent operations
      Returns:
      the task executor