quickwit
quickwit copied to clipboard
Add split boundary/partition policy
We want the ability to add some constraints on the bounds of the splits that Quickwit produces. For instance, a daily split boundary policy would ensure that splits do not span across multiple days.
To do
- [ ] Naming (boundary, partition, ...)
- [ ] Spec how index config declares policy
- [ ] Ensure indexer starts a new split each time a boundary is crossed
- [ ] Ensure splits are not merged across boundaries
I suspect there is a cool hack that could simplify the logic here and make it more powerful
For instance, could we introduce a concept of "split group" string. Each split would belong to a split group. Only splits within the same split group could be merged together.
The split group string generation could then be defined by concatenating any info users would configure. For instance:
- source_id
- yyyy
- yyyymm
- yyyymmdd
- yyyymmddhh
this sounds a lot like what routing expression are. We could add features to the dsl to allow this kind of things, and get the "new split on boundary crossed" and "no merge across boundaries" for free