semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

Consider `http.status_code_class` attribute

Open alanwest opened this issue 3 years ago • 10 comments
trafficstars

Inspired by https://github.com/open-telemetry/opentelemetry-specification/pull/2943#issuecomment-1312400189.

because metrics are sensitive to cardinality, I've seen instrumentations using strings like 4xx, 5xx for status code.

Proposal is to add a new attribute for grouping status codes by class, i.e., 1xx, 2xx, 3xx, etc. See: https://datatracker.ietf.org/doc/html/rfc9110#section-15.

Open questions:

  • Do we want a new attribute?
  • Should the value of the attribute be 1xx, 2xx, etc OR informational, successful?
  • For metrics, is this attribute conditionally required if status is present?
  • For metrics, would http.status_code become optional?
  • Should traces also have this attribute?

alanwest avatar Nov 14 '22 19:11 alanwest

Should the value of the attribute be 1xx, 2xx, etc OR informational, successful?

I would prefer grouping starting with the first digit of the status code class.

The httpcheck receiver in the collector currently reports http.status_class as an attribute on some metrics it produces https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/httpcheckreceiver#http-check-receiver

codeboten avatar Nov 17 '22 22:11 codeboten

For metrics, is this attribute conditionally required if status is present?

I guess that would make more sense than making status_code optional, which I'm not sure would even be possible since it is conditionally required today. I'd also think the value being the first digit is a more "natural" solution.

joaopgrassi avatar Nov 22 '22 12:11 joaopgrassi

Hey @alanwest - I suggest you bring it to the Semantic Conventions SIG (every Monday after the maintainers meeting).

carlosalberto avatar Nov 22 '22 14:11 carlosalberto

@alanwest @codeboten @joaopgrassi please let us know if you think this should be considered as a blocker for HTTP semantic convention stability? my current assumption is that http.status_code is low-cardinality enough for most usage, and http.status_code_class could be added later as an opt-in attribute (and users of that attribute could opt-out of of http.status_code)

trask avatar Feb 22 '23 23:02 trask

please let us know if you think this should be considered as a blocker for HTTP semantic convention stability?

@trask I am not inclined to consider this a blocker for stability. I opened this issue based on https://github.com/open-telemetry/opentelemetry-specification/pull/2943#issuecomment-1312400189.

@yurishkuro do you feel this should be a blocker?

edit: just to expand a bit on why I do not think this is a blocker.. I personally think that a good default behavior is to include the full status code for metrics. For users who do experience cardinality issues due to status code, instrumentation might be able to be configured to turn off http.status_code and enable http.status_code_class. In this way, I believe http.status_code_class could be an additive, non-breaking change in the future.

alanwest avatar Feb 24 '23 02:02 alanwest

What is our general stance one such "trivially derivable" attributes? The http.status_code_class attribute will always be redundant with the (required) http.status_code. It seems conceptually useless to report this from the agent/client side already, but it might be useful in processing further down the line (maybe as soon as in some aggregating exporter in-process).

I feel like this needs some more general discussion

Oberon00 avatar Feb 24 '23 16:02 Oberon00

I personally think that a good default behavior is to include the full status code for metrics. For users who do experience cardinality issues due to status code, instrumentation might be able to be configured to turn off http.status_code and enable http.status_code_class. In this way, I believe http.status_code_class could be an additive, non-breaking change in the future.

my thought also 👍

@Oberon00 do you think this needs more discussion before removing it as a blocker for HTTP semconv stability?

trask avatar Feb 27 '23 22:02 trask

Offering status_code_class as an alternative to status_code implies making status_code non-required (relaxing the condition). Wouldn't that be a breaking change? Just one example: backends may rely on presence of status_code to detect network errors, as discussed in open-telemetry/opentelemetry-specification#3253

Oberon00 avatar Mar 01 '23 10:03 Oberon00

Wouldn't that be a breaking change?

ya, makes sense, as soon as you opt-out of http.status_code you would no longer be producing HTTP semconv compliant telemetry, which means we wouldn't be able to define this kind of optional behavior in the HTTP semconv spec.

I added a comment https://github.com/open-telemetry/opentelemetry-specification/pull/3225#discussion_r1122619001

I see a couple of options:

  • make http.status_code Recommended instead of Conditionally Required
  • users could write a metric view to normalize http.status_code values, e.g. map values [400-499] to 400 and maintain semconv compliance (at the cost of lying a bit)
  • users could write a metric view to replace http.status_code with my.status_code_class and lose benefits of HTTP semconv compliance

trask avatar Mar 02 '23 06:03 trask

We discussed this in the HTTP semconv meeting today.

I have sent open-telemetry/opentelemetry-specification#3289 to clarify that

Attribute requirement levels apply to instrumentation. Because users can transform their telemetry in a number of ways (e.g. metric views, span processors, and collector transformations), these requirement levels cannot be relied on by telemetry consumers.

I think this gives the flexibility to make something like http.status_code_class Opt-In (and http.status_code Opt-Out) in the future.

trask avatar Mar 03 '23 23:03 trask