quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Optimization: filter + scoring by timestamp

Open fulmicoton opened this issue 1 year ago • 2 comments

There are several ideas we could leverage to sort by timestamp.

When we sort AND filter a timestamp range, we end up fetching the timestamp twice.

The timestamp is often almost sorted. A minor amount of metadata could make it possible to restrict our query.

within [t_start, t_end] implies doc in [doc_a, doc_b]

fulmicoton avatar Apr 24 '24 09:04 fulmicoton

I think this is similar to what I wrote here recently https://github.com/quickwit-oss/tantivy/issues/2352#issuecomment-2067577027 :

I've been thinking if we should flag fast fields as almost sorted during creation (e.g. almost sorted in a range of 100 values) and then use that information to do a binary_search + 100 values scan.

The almost sorted check could be done during serialization and should not cost much.

PSeitz avatar Apr 24 '24 11:04 PSeitz

Yes. Let's keep that for later though. :)

fulmicoton avatar Apr 25 '24 00:04 fulmicoton