tantivy
tantivy copied to clipboard
Efficiently Retrieve Fast Field Values and Compute Min/Max for Numeric Fast Fields
Questions:
- How can I efficiently retrieve all values for two
u64fast fields without loading other fields? - What is the recommended approach to compute the minimum and maximum values for numeric fast fields?
For each segment:
Open the Column via FastFieldReaders.
Access ColumnValues on Column and use iter
minimum and maximum values for numeric fast fields is already computed and accessible on ColumnValues
@PSeitz If there are two other fields of string type, defined as STRING | STORED | FAST, is it necessary to traverse all documents to achieve a max aggregation grouped by these two fields?
max aggregation on a string field?
@PSeitz Sorry, let me clarify. I have three fields: f1, f2, and f3. f1 and f2 are string fields, while f3 is a numeric field. What I want is to find the maximum value of f3 when f1 equals v1 and f2 equals v2. like select max(f3)where f1=v1 and f2=v2
f1 = v1 and f2=v2 is better done using an indexed field with a raw tokenizer. You can then compute the max value of f3 yes. You can do the latter as an aggregation, or write your own collector.
If there are two other fields of string type, defined as STRING | STORED | FAST, is it necessary to traverse all documents to achieve a max aggregation grouped by these two fields?
No. Generally speaking all fast field allow random access. There is no traversing necessary. Also tantivy uses dictionary encoding in which term ids are sorted, so if you want a max according to lexicographic order, it is actually possible to get the best term ord per segment, and only decode those.
You can have a look the docs here: https://docs.rs/tantivy/latest/tantivy/aggregation/index.html
term_query would be f1 = v1 AND f2=v2 and as collector the max aggregation