Get count from split metadata on simple time range query
I think I found a nice optimization here while starting to research the search query code. @rdettai
Count queries can return much faster by not downloading splits, and I couldn't think of a good reason to always download split files on time range queries, if the split time range is fully contained in the search request time range.
Split the PR into 2: https://github.com/quickwit-oss/quickwit/pull/5759.
Nice optimization. I think it deserves a unit test. E.g the following test scenario:
* The mock metastore returns 4 splits (1 outside the range, 1 overlapping the star, 1 overlapping the end, 1 overlapping the whole range and 1 within the range. * The mock search service expects only 1 call.
Will add tests when I have a bit more time.
Thanks for adding a test. Unfortunately I think it is flawed (so was my example of test earlier, I should have written "the mock search service expects to be called for 3 splits")
You're totally right, I was supposed to be making a count query, fixed now.
last minute thought, you could also add some tests in https://github.com/tontinton/quickwit/blob/926a2f33d7b35a5cee064adc457c4503f96dc725/quickwit/rest-api-tests/scenarii/qw_search_api/0001_ts_range.yaml to double check the limit conditions, e.g:
endpoint: simple/search
params:
query: "*"
start_timestamp: 1684993000
end_timestamp: 1684993004
expected:
num_hits: 3
should confirm that the upper bound exclusion condition is correct and stays that way.