thanos
thanos copied to clipboard
Add Native Histogram Support
Is your proposal related to a problem?
Experimental native histogram support was released with Prometheus v0.40.x. https://github.com/thanos-io/thanos/pull/5896 updated Thanos to Prometheus v0.40.1 but without implementing native histograms.
I propose the implementation of native histogram support (behind a feature flag) as a next step in follow up PR(s).
Describe the solution you'd like
We need to go through all Thanos components and check what needs to be changed for native histogram support. So far we identified the following changes are required (list will be updated on new findings):
- [x] Sidecar | https://github.com/thanos-io/thanos/pull/6032
- Add native histograms to storepb and prompb types
- [x] Query | https://github.com/thanos-io/thanos/pull/6032
- Add native histogram support for dedup and query iterators
- Update UI for properly displaying native histograms
- [x] Query Frontend | https://github.com/thanos-io/thanos/pull/6071
- Update types
- [x] Receive | https://github.com/thanos-io/thanos/pull/6032
- Update writer to append histograms
- [ ] Compactor | https://github.com/thanos-io/thanos/pull/6350
- Implement native histogram downsampling
- [x] Store | https://github.com/thanos-io/thanos/pull/6056
- [x] Rule | https://github.com/thanos-io/thanos/pull/6390
- Enable use of GRPC query API for ruler, so that we can record native histograms
For each component we need to make sure to have tests in place after implementation and e2e test for functionality provided by multiple components (e.g. ingestion, querying, etc.).
Just double checking, I see we are adding native histogram support to query, receiver and sidecar in https://github.com/thanos-io/thanos/pull/6032.
~~For components like compactor
, store gateway
, do we still need implement anything to make sure they work with native histograms? If no code changes required, at least we should create some tests to make sure they work properly.~~
Updated: NVM I think we need additional support for those components. Most of our code are assuming XOR encoding or Aggr encoding only.
IIUC, only query frontend, Compactor and Ruler needs to support Native histograms?
I am wondering if we need to do anything specific in Ruler or it works out of the box by updating the Prometheus dependency. I guess the latter.
IIUC, only query frontend, Compactor and Ruler needs to support Native histograms?
I am wondering if we need to do anything specific in Ruler or it works out of the box by updating the Prometheus dependency. I guess the latter.
I also think it's the latter, but we will double check and add an e2e test for rulers.
Downsampling and compaction is blocked by a bug we hit in Prometheus regarding appending histograms to open chunks during compaction, which should be fixed by https://github.com/prometheus/prometheus/pull/12185. The bug needs to be fixed before we can open a PR for compaction/downsampling of native histograms.
Ruler does not work with native histograms out of the box. promclient
returns model representation of native histogram for queries, which we need to translate to histogram.FloatHistograms
: https://github.com/thanos-io/thanos/blob/5d5d39a35ba62889aa759d60380fe43deee386e4/pkg/promclient/promclient.go#L483-L486
We will look into using the GRPC query API for the ruler so we directly get histogram.FloatHistograms
in the result.
We have PRs now for all components. I just added the last one for rule.