opentelemetry-specification
opentelemetry-specification copied to clipboard
Metric cardinality limits "overflow attribute set" causes inconsistent attribute keys
The Cardinality Limits section of the Metrics SDK spec says:
An overflow attribute set is defined, containing a single attribute
otel.metric.overflowhaving (boolean) valuetrue, which is used to report a synthetic aggregation of the metric events that could not be independently aggregated because of the limit.
This will cause every metric from the SDK to have an inconsistent set of attribute keys across its streams:
mycounter{a="hello" b="world"} 2
mycounter{a="bar" b="foo"} 3
mycounter{a="" b="foo"} 3
mycounter{otel.metric.overflow="true"} 100
Specifically for Prometheus/OpenMetrics this is a bad practice:
Metrics with the same name for a given MetricFamily SHOULD have the same set of label names in their LabelSet.
One solution is to add the overflow label to every metric:
mycounter{a="hello" b="world" otel_metric_overflow="false"} 2
mycounter{a="bar" b="foo" otel_metric_overflow="false"} 3
mycounter{a="" b="foo" otel_metric_overflow="false"} 3
mycounter{a="" b="" otel_metric_overflow="true"} 100
IMO this is pretty clunky. The user also has to be careful when grouping by a in this example to explicitly filter otel.metric.overflow=false or the a="" stream will be grouped with the overflow stream containing other values for a.
This will cause every metric from the SDK to have an inconsistent set of attribute keys across its streams:
The otel.metric.overflow series should only show up once an instrument exceeds the cardinality limit, right? Is there an implementation that behaves differently? So you only see this behavior when the system is in a bit of an error state.
Not sure about existing implementations. This part of the spec could be interpreted otherwise:
The SDK MUST create an Aggregator with the overflow attribute set prior to reaching the cardinality limit and use it to aggregate events for which the correct Aggregator could not be created.
Yeah that is a bit suspect. I wonder if that aggregator is expected to be included in collection results before any measurements are received?
Yeah that is a bit suspect. I wonder if that aggregator is expected to be included in collection results before any measurements are received?
+1
For my understanding: It seems like it should mean
"If the cardinality limit is 1000, create it when you have 999 streams to make sure you can fit the
otel.metric.overflowin still."
But it could also be interpreted (and I think that is what @aabmass shows above) as:
"Create the
otel.metric.overflowas the first attribute to make sure it is always considered."
Is that right? Or is there another interpretation?
In any event, the issue of inconsistent attribute keys would still come up if an overflow is reached. Do we think that
- OTel metrics data model should recommend keeping a consistent set of keys across all dimensions of a metric (similar to OpenMetrics)?
- the overflow metrics should follow this guidance
I think the point of this overflow attribute was to keep you from losing data once the limit is reached. You will lose data either way, but with the overflow attribute, at least the overall sum would be right. From my reading of the section you mentioned, the overflow should be taken into account on a per-metric-key basis. Additionally, it says:
An overflow attribute set is defined, containing a single attribute otel.metric.overflow
So, from how I understand the wording today, that would mean that this is the right way of doing it:
mycounter{a="hello" b="world"} 2 mycounter{a="bar" b="foo"} 3 mycounter{a="" b="foo"} 3 mycounter{otel.metric.overflow="true"} 100
I don't know if recommending sending "empty" attributes, especially when not sending to Prometheus, makes sense.
@pirgeo if mycounter is function of its dimensions, shouldn't it always include all of those dimensions? If you sent the those metrics to a backend and queried grouping by a, which stream would it be grouped with?
I suppose the problem is that the overflow stream has "floating" values for all of the missing labels. It can only really be used if you DON'T groupby anything. I'm not sure how many metric backends would understand this.
It can only really be used if you DON'T groupby anything.
That was the whole point of the overflow attribute, I think. If you don't have an overflow attribute but want to cap the used memory at some point, you have to start throwing away streams that would still influence the grand total. The point of the overflow label is to say: Actually, we've hit the limit of what we want to keep in memory, but instead of dropping your data, we are collecting everything that's not part of one of the existing streams here, so you can at least know the grand total of everything that has come in. You will lose data either way, but at least the total (when not grouping) is right. Seeing something like this should probably be followed up by either reducing the number of splits you introduce in your data (i.e., removing some attributes) or increasing the cardinality limit. I don't think this is a state that you want to be in for extended periods of time.
As for the Prometheus implementation: I am not really sure what the "right" way to send it to Prometheus is, maybe it is to set all attributes to empty values. Generally, I don't think we need to send all dimensions with empty values regardless (in OTLP, for example, there is no restriction to do it that way). I guess it becomes a backend topic how to deal with such data.
From the meeting: The goal is to make a lot of noise when a metric overflows. Something has gone very wrong.
One possible alternative to an overflow label on an existing metric would be an overflow metric with a label for the metric name. This would still provide the "loud noise" to warn users that something is badly wrong. It would also be easy to write an alert for to that effect. E.g. otel.cardinality.overflows would become:
#TYPE otel_cardinality_overflows_total counter
#HELP otel_cardinality_overflows_total a count of the number of measurements which overflowed cardinality limits
otel_cardinality_overflows_total {metric_name ="mycounter"} 1
However, this would remove the property of the total still being correct, which was one of the goals of the original proposal.
From the meeting: The goal is to make a lot of noise when a metric overflows. Something has gone very wrong.
One possible alternative to an overflow label on an existing metric would be an overflow metric with a label for the metric name. This would still provide the "loud noise" to warn users that something is badly wrong. It would also be easy to write an alert for to that effect. E.g.
otel.cardinality.overflowswould become:#TYPE otel_cardinality_overflows_total counter #HELP otel_cardinality_overflows_total a count of the number of measurements which overflowed cardinality limits otel_cardinality_overflows_total {metric_name ="mycounter"} 1However, this would remove the property of the total still being correct, which was one of the goals of the original proposal.
Correct, the alternative solution here would not be able to provide the correct total/sum, which is one critical requirement that we want to meet.
I plan to move forward with the current approach (having the overflow attribute on the existing metric) after discussing with the other TC members during the Sep. 25, 2024 TC Meeting. Please either ping me on Slack or comment here before the end of Sep. 2024.