quickwit
quickwit copied to clipboard
Support for large start_offset
Outlined use case here https://github.com/quickwit-oss/quickwit/issues/1449#issuecomment-1127911151
After https://github.com/quickwit-oss/quickwit/pull/1539, the limit is 10_000. It would be good to support very large start_offsets for some scenarios, e.g. data extraction.
For very large start_offset, we can probably assume that:
- strict score sorting is not that important anymore
- the split merge algorithm should still be deterministic, to be able to scan the data
I've played with deep pagination (based on scrolls), the performance was definitely not great past a few million entries. we now support search_after on the ES api, which is arguably a much better way to handle that. Is that enough? (something alike should probably be added to the quickwit api)
Hey, I'm not sure I follow - is there a best practice on how to search through a quickwit index via API, in a paginated way beyond 10K offset? Is there a plan to update the start_offset to be greater than 10K at some point?
@PSeitz
search_after in the ES compatible API is the way to go currently