opentelemetry-specification Remove http.XX_content

Remove http.XX_content_length* semantic attributes

Open iNikem opened this issue 2 years ago • 15 comments

Http semantic convention defines 4 attributes which are very different from all others. Namely http.request_content_length, http.request_content_length_uncompressed, http.response_content_length and http.response_content_length_uncompressed. I think this information should be captured as metrics, not as span attributes. I cannot imagine a use-case where anybody would want to search/filter by this attribute or use it in any other "dimension"-like way. This is clearly a metric.

Oct 18 '21 14:10 iNikem

How about filtering for requests that exceed a certain size? Or requests without body (length=0)?

Oct 18 '21 14:10 Oberon00

Ok, true. Still, wouldn't it better served by metrics?

Oct 18 '21 14:10 iNikem

How about filtering for requests that exceed a certain size? Or requests without body (length=0)?

Also, having those attributes on spans will provide more help in troubleshooting or reproducing problems. You might get answers to those questions:

Why do request sizes exceed a certain limit? Why do response sizes exceed a certain limit? Why are request bodies missing?

You are much more likely to get answers to those questions from traces than from metrics.

Oct 18 '21 18:10 pyohannes

assuming metrics are correlated to traces with exemplars, would it be possible to answer all these questions just not for a particular request? I.e. if you have problems with big requests, you'd see an exemplar of it on metrics for request/response size and there will be no need to stamp it on every request.

Nov 06 '21 04:11 lmolkova

@lmolkova Probably true, but the same could be said for http.status_code: "If you have a problem with status 500 requests, you'd see an examplar of it on metrics for status codes".

Nov 06 '21 13:11 Oberon00

@Oberon00 I see your point, I'm not sure status code is the same as request size. Status code indicates a failure and you'd generally want to know about it. Size might be the cause of failure or big latency and I think only the correlation between size and status/latency is interesting. Size alone is rarely important. This correlation seems to be better expressed with metrics.

Nov 07 '21 21:11 lmolkova

I've always understood there is a (unspoken?) goal to allow spans to be converted to metrics fairly robustly, and is the motivation for work on propagating sampling probabilities among others. So I always assumed the attributes on a span would always be a superset of metrics - though this may not be true.

Nov 08 '21 02:11 anuraaga

I guess it boils down to the stage at which metrics are reported: 1) separately, 2) in instrumentation API or 3) from spans in post-processing, e.g. collector. I think p3 leads to different challenges related to sampling and filtering and metrics reconstruction may not work great with all samplers or make them very complicated, as they start having a state. Adaptive sampling algorithms that keep rate of reported traces (e.g. 5 traces per sec) are complex enough which leads to inaccuracy in calculations, especially with bursts of traffic. So p3 seems like an viable, but imprecise solution suitable for some cases. It's great where p1 or p2 are not available. Assuming we can report metrics from instrumentation API before (and regardless of) starting a span, we'd still have all attributes, but stamp a subset of them on spans and others go to metrics. I agree this issue depends on overall metrics collection approach.

Nov 08 '21 04:11 lmolkova

#1769 might be a reference on the sampling algorithm issue - the impression I got is that a hard rate limiting approach isn't appropriate in OTel for the reasons you mention, making it impossible to compute metrics from spans, so an approximate rate limiting approach is used instead. I personally don't need to compute metrics from spans and don't think instrumentation should take the potentially imprecise approach vs directly collecting metrics, but it appears to be important in general to keep the door open for span-based metrics. I guess removing attributes like these content length ones could close the door on it.

Nov 08 '21 05:11 anuraaga

Ah - I just remembered that I had initially added these attributes to the spec 😅 I think my only motivation was to close a gap with the X-Ray data model, which includes them too

https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-http

If these attributes were removed, X-Ray would probably have to use the relatively new response.header.content_length which should be fine.

Nov 08 '21 05:11 anuraaga

Came across it again in scope of #2469. Here's how these attributes are populated: https://github.com/open-telemetry/opentelemetry-specification/issues/2114#issuecomment-1079473468 in Java, .NET, JS, Python and Go.

http.XX_content_length is populated by some Java instrumentations, all Go instrumentations, and all JS instrumentations (.NET and Python don't populate them).
Only JS populates http.XX_content_length_uncompressed conditionally depending on Content-Encoding == identity.
AWS XRay seems to only need response content-length, which is "content_length – integer indicating the length of the response body in bytes." - presumably raw data length
HTTP clients don't necessarily decompress and instrumentation doesn't really know when decompression happens (e.g. when stream is read which can happen after the span ends)

So one thing I'd like to do is remove *_uncompressed attributes.

Then content_length: I'm trying to understand when it doesn't match Content-Length header. One case I found is where Chunked encoding is used, which requires Content-Length to not be set, but then it's up to network stream implementation to read it from the socket until EOF, so HTTP instrumentation won't be able to know about content length anyway.

Unless I'm missing something, we can remove all 4 attributes and let users/distros opt-in into http.*.header.content_length

@anuraaga lmk if you still think it'd work for AWS.

@Oberon00 @iNikem @pyohannes please share if you have any objections.

Apr 20 '22 00:04 lmolkova

I think that should work for X-Ray.

As an anecdote, I have found good value from response length and uncompressed length in metrics when using the Armeria server library which exports these natively. It's true that black box instrumentation will generally not be able to handle the compression, but native instrumentation potentially could. Just food for thought on whether OTel semantic conventions are supposed to apply even when they would be hard to support in instrumentation libraries but could be in frameworks directly.

Either way I would remove now and potentially add back only in metrics if needed later to avoid slowing down semantic convention stability.

Apr 20 '22 01:04 anuraaga

HTTP clients don't necessarily decompress and instrumentation doesn't really know when decompression happens (e.g. when stream is read which can happen after the span ends) So one thing I'd like to do is remove *_uncompressed attributes.

Would the opposite situation also be plausible, i.e. HTTP clients that always decompress and give no possibility to access the raw length? Maybe it would be best to remove these altogether and only rely on the generic header conventions to report any content-length header?

Apr 20 '22 08:04 Oberon00

Would the opposite situation also be plausible, i.e. HTTP clients that always decompress and give no possibility to access the raw length?

I believe HTTP clients always know the raw length (at least from Content-Length header, which is raw payload size), but if we define our attributes as something smarter, then it's up to the instrumentation author to interpret it.

Maybe it would be best to remove these altogether and only rely on the generic header conventions to report any content-length header?

Agreed and, also to @anuraaga 's point, - we probably would find metrics around content length very useful. And when it happens, we might need to return http.*.header.content_length back or require to add header attributes by default

Apr 20 '22 17:04 lmolkova

after some back and force, I kept http.request_content_length and http.response_content_length because they are useful and enable metrics-over traces scenario.

But given how unreliable uncompressed length population is and that OTel HTTP instrumentations don't usually populate it, I removed http.response_content_length_uncompressed and http.request_content_length_uncompressed. in #2469 .

Since http request and response size metrics are defined now, I believe the only open question here is if we want to have a special name for attributes (http.request|response_content_length) or we can reuse http.*.header.content_length.

since Content-Length header is not always present, but some specialized instrumentation can still populate it and it's nice to have a generic name for it, I suggest keeping http.request|response_content_length.

@iNikem thoughts? do you feel we can close thin one?

Jul 27 '22 19:07 lmolkova

opentelemetry-specification opentelemetry-specification copied to clipboard

Remove http.XX_content_length* semantic attributes

opentelemetry-specification
opentelemetry-specification copied to clipboard