opentelemetry-specification
opentelemetry-specification copied to clipboard
Add metric advice to capture only a specific set of values for a given attribute, and normalize the rest to a value which represents "other"
This would be needed if we go with Limit HTTP request method cardinality: use one attribute (option C/E) as the solution for https://github.com/open-telemetry/opentelemetry-specification/issues/3470.
The metric advice could take something like the following arguments:
- attribute name (
string
) - attribute values to preserve (
array of <value-type>
) - attribute value representing "other" (
<value-type>
)
Open questions:
Should we have a dedicated "other" value?
For string
attributes, the "other" value could be something obscure like __OTEL_CAPPED
.
A downside to this is that it will not render as nicely as something like GET/POST/OTHER
, but it would allow backends to reliably detect when an attribute value was capped (we could use something less obscure like OTHER
but that seems like it could occur in instrumentation's natural habitat).
For numeric
attributes, I'm not sure what could make sense as a dedicated "other" value across all use cases. One known numeric attribute which could possibly benefit from this feature is http.response.status_code
.
It may be ok to have a dedicated "other" value only for string
attributes.
How to address the correlation problem
The main argument against Limit HTTP request method cardinality: use one attribute (option C/E) is that you can't correlate http.request.method
between metrics and spans for values that were capped (only) on the metric side.
But if you know the values that were preserved, and you know the "other" value, then you could still do this correlation (even if maybe less efficiently):
metrics with the "other" value correlate to spans with any value other than the list of preserved values
Potentially a backend could figure out the list of preserved values (at least those which would have correlated with any spans), by looking at the complete set of http.request.method
values that were not capped in the given time period (and with given resource and instrumentation scope name, since the set of preserved values could vary across resources and instrumentation libraries).
But this feels very complex and brittle, so I'm not sure it's a great answer to the "correlation problem".