thanos icon indicating copy to clipboard operation
thanos copied to clipboard

thanos/receive: add feature flag to put ext labels into TSDB

Open GiedriusS opened this issue 1 month ago • 7 comments

Enable streaming by putting external labels into the TSDB. If the external labels ever change then the head is pruned during startup.

GiedriusS avatar Oct 28 '25 14:10 GiedriusS

Consider adding to PR description that

This relates to prior work on re-sorting responses with external labels in https://github.com/thanos-io/thanos/issues/8487, and memory-use increases as discussed in https://github.com/thanos-io/thanos/issues/7395 .

This change only affects blocks in Receive's memory, not blocks shipped to the object store and fetched via Store gateway, or blocks read from Prometheus via Sidecar. But Query will still benefit from the streaming response from Receive even if other components still need to buffer, inject labels and sort.

ringerc avatar Oct 30 '25 22:10 ringerc

I don't know if this solves the problem completely. The key issue is that dropping an internal label can cause the set to become unsorted, I don't think where we store external labels changes this.

fpetkovski avatar Oct 31 '25 09:10 fpetkovski

I don't know if this solves the problem completely. The key issue is that dropping an internal label can cause the set to become unsorted, I don't think where we store external labels changes this.

Agree - I dont think this solves the issue - We should probably add external labels as a slice in the Series gRPC call and only allow dropping from that slice. Then we will keep the series order and can drop I think.

MichaHoffmann avatar Oct 31 '25 09:10 MichaHoffmann

I don't know if this solves the problem completely. The key issue is that dropping an internal label can cause the set to become unsorted, I don't think where we store external labels changes this.

We aren't dropping any labels anywhere with this patch. Could you elaborate a bit more?

GiedriusS avatar Oct 31 '25 11:10 GiedriusS

I thought this PR is needed to enable streaming series without resorting so my comment is related to that feature.

fpetkovski avatar Oct 31 '25 11:10 fpetkovski

I thought this PR is needed to enable streaming series without resorting so my comment is related to that feature.

Yeah but with this patch "without replica labels" doesn't do anything anymore - we put sorted series in with external labels, get sorted series out. Your original comment said that dropping labels causes the set to be unsorted but we don't do that anywhere, do we?

GiedriusS avatar Oct 31 '25 12:10 GiedriusS

The key issue is that dropping an internal label can cause the set to become unsorted, I don't think where we store external labels changes this.

This is only true if the series data has collisions with the external labels.

If I understand correctly, @fpetkovski is probably saying that this change has to either clobber existing values by replacing them with the configured external labels, or it has to respect the existing user values and fail to enforce the external labels.

Is that right?


I increasingly think that conflicts between external and user labels should simply be made an error that results in ingestion, scrape, or query failure (depending on where the conflict arises). Or at least an annotation warning that query results may be invalid and incorrect. A pre-existing label with the same name as an external label would be permitted and accepted, but only if its value is the same as the configured external label.

That'd make the whole mess go away throughout the stack. A feature flag could be used to introduce enforcement optionally, then made on by default but allow people to turn it off for a while before wholly removing the option. When enforcement is on, Sidecar wouldn't have to re-sort (it'd just check for and enforce labels instead, then inject them into the stream) so no need for https://github.com/thanos-io/thanos/issues/8487. And we could probably add a similar FF to Prometheus for its remote-read external labels, or simply stop using prom-level external labels in favour of having Sidecar inject them as it processes the metric stream it is forwarding.

Receive, the subject of this PR, could drop samples with conflicting external labels and increment a suitable counter to ensure this can be observed. Then it could discard all label-pairs that match the configured external labels; there'd be no need to store them in the TSDB if they're known from external labels already, they can just be injected when streaming-out the response without changing the sort order.

For cascaded setups, fan-outs, etc, the lower tiers would need one set of external label names, and the higher tiers another. E.g. the lower tier might have {prometheus="foo",datacentre="bar"} and the upper tier might only have {datacentre="bar"}. So the upper tier would still be able to enforce datacentre="bar", and the prometheus="foo" (or whatever value) would be promoted to a normal series label for ingestion, forwarding, etc. But my understanding is that this is basically how it should work anyway, nobody should be overriding external labels in lower tiers of a distributed query fan-out or similar. Correct?

ringerc avatar Nov 30 '25 22:11 ringerc