quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Get count from split metadata on simple time range query

Open tontinton opened this issue 7 months ago • 3 comments

I think I found a nice optimization here while starting to research the search query code. @rdettai

Count queries can return much faster by not downloading splits, and I couldn't think of a good reason to always download split files on time range queries, if the split time range is fully contained in the search request time range.

Split the PR into 2: https://github.com/quickwit-oss/quickwit/pull/5759.

tontinton avatar Apr 19 '25 17:04 tontinton

Nice optimization. I think it deserves a unit test. E.g the following test scenario:

* The mock metastore returns 4 splits (1 outside the range, 1 overlapping the star, 1 overlapping the end, 1 overlapping the whole range and 1 within the range.

* The mock search service expects only 1 call.

Will add tests when I have a bit more time.

tontinton avatar Apr 22 '25 20:04 tontinton

Thanks for adding a test. Unfortunately I think it is flawed (so was my example of test earlier, I should have written "the mock search service expects to be called for 3 splits")

You're totally right, I was supposed to be making a count query, fixed now.

tontinton avatar Apr 30 '25 19:04 tontinton

last minute thought, you could also add some tests in https://github.com/tontinton/quickwit/blob/926a2f33d7b35a5cee064adc457c4503f96dc725/quickwit/rest-api-tests/scenarii/qw_search_api/0001_ts_range.yaml to double check the limit conditions, e.g:

endpoint: simple/search
params:
  query: "*"
  start_timestamp: 1684993000
  end_timestamp: 1684993004
expected:
  num_hits: 3

should confirm that the upper bound exclusion condition is correct and stays that way.

rdettai avatar May 14 '25 15:05 rdettai