quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Support for large start_offset

Open PSeitz opened this issue 3 years ago • 3 comments

Outlined use case here https://github.com/quickwit-oss/quickwit/issues/1449#issuecomment-1127911151

After https://github.com/quickwit-oss/quickwit/pull/1539, the limit is 10_000. It would be good to support very large start_offsets for some scenarios, e.g. data extraction.

For very large start_offset, we can probably assume that:

  • strict score sorting is not that important anymore
  • the split merge algorithm should still be deterministic, to be able to scan the data

PSeitz avatar May 25 '22 09:05 PSeitz

I've played with deep pagination (based on scrolls), the performance was definitely not great past a few million entries. we now support search_after on the ES api, which is arguably a much better way to handle that. Is that enough? (something alike should probably be added to the quickwit api)

trinity-1686a avatar Jan 04 '24 18:01 trinity-1686a

Hey, I'm not sure I follow - is there a best practice on how to search through a quickwit index via API, in a paginated way beyond 10K offset? Is there a plan to update the start_offset to be greater than 10K at some point?

@PSeitz

alorw avatar May 13 '25 08:05 alorw

search_after in the ES compatible API is the way to go currently

PSeitz avatar May 13 '25 11:05 PSeitz