lucene
lucene copied to clipboard
LUCENE-10560: Faster merging of TermsEnum
This commit adds a new TermsEnumIndex
abstraction in oal.index
that wraps a
TermsEnum
and an index of the segment that it belongs to, and can be used to
create priority queues that merge TermsEnum instances (either from the inverted
index or from doc values). In either case, a long that holds the first 8 bytes
of the term is computed in order to speed up comparisons. In the doc-values
case, OrdinalMap
also leverages seek-by-ord capabilities to reason about
shared prefixes across entire windows of terms to not compare shared prefixes
whenever re-ordering the queue, this should especially help with fields that
may share long common prefixes like URLs.
On luceneutil's OrdinalMap
benchmark, construction time reduced by 30.5% for
the id
field and by 17.5% for the name
field.
JIRA: LUCENE-10560