thanos icon indicating copy to clipboard operation
thanos copied to clipboard

Add Native Histogram Support

Open rabenhorst opened this issue 2 years ago • 6 comments

Is your proposal related to a problem?

Experimental native histogram support was released with Prometheus v0.40.x. https://github.com/thanos-io/thanos/pull/5896 updated Thanos to Prometheus v0.40.1 but without implementing native histograms.

I propose the implementation of native histogram support (behind a feature flag) as a next step in follow up PR(s).

Describe the solution you'd like

We need to go through all Thanos components and check what needs to be changed for native histogram support. So far we identified the following changes are required (list will be updated on new findings):

  • [x] Sidecar | https://github.com/thanos-io/thanos/pull/6032
    • Add native histograms to storepb and prompb types
  • [x] Query | https://github.com/thanos-io/thanos/pull/6032
    • Add native histogram support for dedup and query iterators
    • Update UI for properly displaying native histograms
  • [x] Query Frontend | https://github.com/thanos-io/thanos/pull/6071
    • Update types
  • [x] Receive | https://github.com/thanos-io/thanos/pull/6032
    • Update writer to append histograms
  • [ ] Compactor | https://github.com/thanos-io/thanos/pull/6350
    • Implement native histogram downsampling
  • [x] Store | https://github.com/thanos-io/thanos/pull/6056
  • [x] Rule | https://github.com/thanos-io/thanos/pull/6390
    • Enable use of GRPC query API for ruler, so that we can record native histograms

For each component we need to make sure to have tests in place after implementation and e2e test for functionality provided by multiple components (e.g. ingestion, querying, etc.).

rabenhorst avatar Nov 18 '22 17:11 rabenhorst

Just double checking, I see we are adding native histogram support to query, receiver and sidecar in https://github.com/thanos-io/thanos/pull/6032.

~~For components like compactor, store gateway, do we still need implement anything to make sure they work with native histograms? If no code changes required, at least we should create some tests to make sure they work properly.~~

Updated: NVM I think we need additional support for those components. Most of our code are assuming XOR encoding or Aggr encoding only.

yeya24 avatar Jan 11 '23 05:01 yeya24

IIUC, only query frontend, Compactor and Ruler needs to support Native histograms?

I am wondering if we need to do anything specific in Ruler or it works out of the box by updating the Prometheus dependency. I guess the latter.

yeya24 avatar Feb 27 '23 06:02 yeya24

IIUC, only query frontend, Compactor and Ruler needs to support Native histograms?

I am wondering if we need to do anything specific in Ruler or it works out of the box by updating the Prometheus dependency. I guess the latter.

I also think it's the latter, but we will double check and add an e2e test for rulers.

rabenhorst avatar Feb 27 '23 10:02 rabenhorst

Downsampling and compaction is blocked by a bug we hit in Prometheus regarding appending histograms to open chunks during compaction, which should be fixed by https://github.com/prometheus/prometheus/pull/12185. The bug needs to be fixed before we can open a PR for compaction/downsampling of native histograms.

rabenhorst avatar Mar 30 '23 10:03 rabenhorst

Ruler does not work with native histograms out of the box. promclient returns model representation of native histogram for queries, which we need to translate to histogram.FloatHistograms: https://github.com/thanos-io/thanos/blob/5d5d39a35ba62889aa759d60380fe43deee386e4/pkg/promclient/promclient.go#L483-L486

We will look into using the GRPC query API for the ruler so we directly get histogram.FloatHistograms in the result.

rabenhorst avatar Apr 11 '23 10:04 rabenhorst

We have PRs now for all components. I just added the last one for rule.

rabenhorst avatar May 23 '23 15:05 rabenhorst