thanos
thanos copied to clipboard
Push replica labels at the end after Recv
This PR removes the global sort required right before dedup, and redistributes the logic in the proxy heap and async response objects.
We still need to do a global sort in cases where a single store can send multiple replicas of the same series. The querier can detect this case through a warning and it will perform the global sort as before.
- [ ] I added CHANGELOG entry for this change.
- [x] Change is not relevant to the end user.
Changes
- Move sorting of labels to the AsyncSeriesResponse object, right after a series is received. This allows sorting to be distributed between different goroutines.
- Precalculate a signature for each received series that can be used for fast comparison in the k-way merge algorithm.
- Detect cases where a single store sends multiple replicas of the same series. Emit a warning so that the querier knows it needs to sort the response set.
Verification
Existing and new unit tests. Also did some manual tests in one of our own environments.
Benchmark
name old time/op new time/op delta
QuerySelect/1000000SeriesWith1Samples/select-8 1.13s ± 0% 0.74s ± 0% ~ (p=1.000 n=1+1)
QuerySelect/100000SeriesWith100Samples/select-8 103ms ± 0% 77ms ± 0% ~ (p=1.000 n=1+1)
QuerySelect/1SeriesWith10000000Samples/select-8 12.4ms ± 0% 12.8ms ± 0% ~ (p=1.000 n=1+1)
As predicted by the benchmark, I also saw a reduction for a query from 25s to 20s.
Pushing replica labels to the end is still expensive, but as next steps we can do that in stores since replica labels are now added to the SeriesRequest

@GiedriusS @bwplotka this is ready for another round of review. I updated the PR description with the latest information.
There was one issue with pushdown which should be fixed now. Other CI failures are unrelated to the change.
Here is the alternative implementation which does the sort check only in one place, without relying on responses from stores: https://github.com/thanos-io/thanos/pull/5742. It's simpler to understand and maintain, and results are similar. That implementation will also not create false positives because we're directly measuring whether the response set is sorted.
Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days.
Do you mind updating us on the status? Is there anything we can help with? If you plan to still work on it, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next week, this issue will be closed (we can always reopen a PR if you get back to this!). Alternatively, use remind command if you wish to be reminded at some point in future.
Closing for now as promised, let us know if you need this to be reopened! 🤗