opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

Add metric advice to capture only a specific set of values for a given attribute, and normalize the rest to a value which represents "other"

Open trask opened this issue 1 year ago • 5 comments

This would be needed if we go with Limit HTTP request method cardinality: use one attribute (option C/E) as the solution for https://github.com/open-telemetry/opentelemetry-specification/issues/3470.

The metric advice could take something like the following arguments:

  • attribute name (string)
  • attribute values to preserve (array of <value-type>)
  • attribute value representing "other" (<value-type>)

Open questions:

Should we have a dedicated "other" value?

For string attributes, the "other" value could be something obscure like __OTEL_CAPPED.

A downside to this is that it will not render as nicely as something like GET/POST/OTHER, but it would allow backends to reliably detect when an attribute value was capped (we could use something less obscure like OTHER but that seems like it could occur in instrumentation's natural habitat).

For numeric attributes, I'm not sure what could make sense as a dedicated "other" value across all use cases. One known numeric attribute which could possibly benefit from this feature is http.response.status_code.

It may be ok to have a dedicated "other" value only for string attributes.

How to address the correlation problem

The main argument against Limit HTTP request method cardinality: use one attribute (option C/E) is that you can't correlate http.request.method between metrics and spans for values that were capped (only) on the metric side.

But if you know the values that were preserved, and you know the "other" value, then you could still do this correlation (even if maybe less efficiently):

metrics with the "other" value correlate to spans with any value other than the list of preserved values

Potentially a backend could figure out the list of preserved values (at least those which would have correlated with any spans), by looking at the complete set of http.request.method values that were not capped in the given time period (and with given resource and instrumentation scope name, since the set of preserved values could vary across resources and instrumentation libraries).

But this feels very complex and brittle, so I'm not sure it's a great answer to the "correlation problem".

trask avatar Jun 08 '23 04:06 trask