tantivy Add a fast-field variant of TermSet

The TermSet Query currently produces one Scorer/DocSet per matched term by scanning the term dictionary and then consuming posting lists. For very large sets of terms and a fast field, it is faster to scan the fast field column while intersecting with a HashSet of (encoded) term values.

Following the pattern set by the two execution modes of RangeQuery, this PR introduces a variant of TermSet which uses fast fields, and then uses it when there are more than 1024 input terms (an arbitrary threshold!).

Performance is significantly improved for large TermSets of primitives.

Oct 15 '25 23:10 stuhood

I think a precomputed BitSet should also work well, similar to what I did in regex_phrase_weight

Oct 16 '25 09:10 PSeitz

I think a precomputed BitSet should also work well, similar to what I did in regex_phrase_weight

Hm, interesting.

That could be implemented using either the posting lists or fast fields, I assume? So it seems like it's independent of which index we use.

For very high term counts, I think that you'll still want to scan the fast field, since it is so dense? Requires benchmarking probably.

Oct 16 '25 16:10 stuhood

I think a precomputed BitSet should also work well, similar to what I did in regex_phrase_weight

Hm, interesting.

That could be implemented using either the posting lists or fast fields, I assume? So it seems like it's independent of which index we use.

For very high term counts, I think that you'll still want to scan the fast field, since it is so dense? Requires benchmarking probably.

The BitSet would be precomputed on the inverted index. I think we would need a benchmark to justify the additional code. (also good to know :)

Oct 26 '25 18:10 PSeitz

I think we would need a benchmark to justify the additional code.

FWIW: The benchmarks that I did in ParadeDB for this PR are over here: https://github.com/paradedb/paradedb/pull/3412 ... for large term sets, consuming the posting lists uses a very large amount of memory, and does a lot of seeking: this implementation is about 7 times faster in one worker process. Can ignore the "multiple workers as pessimization" aspect: that's specific to having a huge query in ParadeDB.

Oct 26 '25 22:10 stuhood

I think we would need a benchmark to justify the additional code.

FWIW: The benchmarks that I did in ParadeDB for this PR are over here: paradedb/paradedb#3412 ... for large term sets, consuming the posting lists uses a very large amount of memory, and does a lot of seeking: this implementation is about 7 times faster in one worker process. Can ignore the "multiple workers as pessimization" aspect: that's specific to having a huge query in ParadeDB.

I mean comparing the BitSet variant (via BitSetDocSet) with the fastfield variant. In the BitSet variant we would fill the BitSet one by one from the postinglists, which would fix the memory usage to num_docs/8 bytes and remove all seeking.

Oct 27 '25 08:10 PSeitz