tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Efficiently Retrieve Fast Field Values and Compute Min/Max for Numeric Fast Fields

Open rustmailer opened this issue 8 months ago • 7 comments

Questions:

  1. How can I efficiently retrieve all values for two u64 fast fields without loading other fields?
  2. What is the recommended approach to compute the minimum and maximum values for numeric fast fields?

rustmailer avatar Mar 19 '25 20:03 rustmailer

For each segment: Open the Column via FastFieldReaders. Access ColumnValues on Column and use iter

minimum and maximum values for numeric fast fields is already computed and accessible on ColumnValues

PSeitz avatar Mar 20 '25 02:03 PSeitz

@PSeitz If there are two other fields of string type, defined as STRING | STORED | FAST, is it necessary to traverse all documents to achieve a max aggregation grouped by these two fields?

rustmailer avatar Mar 20 '25 06:03 rustmailer

max aggregation on a string field?

PSeitz avatar Mar 20 '25 06:03 PSeitz

@PSeitz Sorry, let me clarify. I have three fields: f1, f2, and f3. f1 and f2 are string fields, while f3 is a numeric field. What I want is to find the maximum value of f3 when f1 equals v1 and f2 equals v2. like select max(f3)where f1=v1 and f2=v2

rustmailer avatar Mar 20 '25 06:03 rustmailer

f1 = v1 and f2=v2 is better done using an indexed field with a raw tokenizer. You can then compute the max value of f3 yes. You can do the latter as an aggregation, or write your own collector.

fulmicoton avatar Mar 20 '25 07:03 fulmicoton

If there are two other fields of string type, defined as STRING | STORED | FAST, is it necessary to traverse all documents to achieve a max aggregation grouped by these two fields?

No. Generally speaking all fast field allow random access. There is no traversing necessary. Also tantivy uses dictionary encoding in which term ids are sorted, so if you want a max according to lexicographic order, it is actually possible to get the best term ord per segment, and only decode those.

fulmicoton avatar Mar 20 '25 07:03 fulmicoton

You can have a look the docs here: https://docs.rs/tantivy/latest/tantivy/aggregation/index.html

term_query would be f1 = v1 AND f2=v2 and as collector the max aggregation

PSeitz avatar Mar 20 '25 07:03 PSeitz