mimir
mimir copied to clipboard
Rejecting valid exemplars with err-mimir-exemplar-labels-missing
Describe the bug
Seeing errors like:
ts=2024-04-14T14:27:50.976027765Z caller=push.go:171 level=error user=12690 msg="push error" err="received an exemplar with no valid labels, timestamp: 0 series: blackbox_module_unknown_total{job=\"unknown_service:blackbox_exporter\"} labels: {} (err-mimir-exemplar-labels-missing)"
From: https://github.com/open-telemetry/opentelemetry-go-contrib/issues/5383#issuecomment-2055549443
According to the OpenMetrics spec it's ok to have Exemplars without labels:
Exemplars without Labels MUST represent an empty LabelSet as {}.
There's even an example in the spec:
foo_bucket{le="0.1"} 8 # {} 0.054Why does Mimir reject this?
To Reproduce
Send an exemplar with no labels to Grafana Cloud.
Expected behavior
Return a 200 OK, not a 400 error.
The Prometheus TSDB code drops exemplars with no labels which is why Mimir rejects them on ingest. It wasn't a problem in practice until now because Prometheus never sends exemplars with no labels.
Discussion here when some of the logic to reject empty exemplars was added: https://github.com/grafana/mimir/pull/873
@gouthamve since this behaviour is by design in Prometheus, can the issue be closed? Or do you think it should be reconsidered in Prom?
I think it needs to be reconsidered in Prometheus. We should atleast not drop them with 400s because OTel seems produce a ton of these empty exemplars.
When that happens, the customers alerts start firing because their writes are failing, and the logs are full of the err-mimir-exemplar-labels-missing error message. This is a sub-par experience for customers.
I'm curious, what's the use case for an exemplar without labels?
From an OTel and specification perspective, its "example values". For this histogram, these are the example values we recorded. @RichiH / @fstab can you chime in further?
:+1: yes, I think it's just example values.
As it's defined in the OpenMetrics spec, and in the OpenTelemetry spec, we should support this. It's maybe ok to ignore the Exemplar itself, but dropping the entire time series is certainly not spec compliant.
I think that dropping the entire time series is a critical bug that deserves a separate issue.
I also think that we should not reject the empty exemplars just because Prometheus doesn't send those.
I'm curious, what's the use case for an exemplar without labels?
When visualized, the values give a better idea of distribution. Especially when buckets are coarse, exemplars can suggest you have a particular modal value, or bimodal distribution, etc.
Thank you, @bboreham, I figured out that myself :D
I created a feature request upstream https://github.com/prometheus/prometheus/issues/14208
The Prometheus TSDB code drops exemplars with no labels
I believe this was a misunderstanding. Nick's source is me at #873, where I said it would be confusing.
I was originally proposing that Mimir should reject an exemplar with labels like {trace_id=""}, which will be stored as {}.
Closed in #8224