opentelemetry-go-contrib
opentelemetry-go-contrib copied to clipboard
Support Custom Metric Attributes Per Request
What
Adds a new WithMetricsAttributesFn option which allows custom metric attributes to be added on a per request basis.
Why
So standard metrics provided by the gRPC instrumentation package can be annotated on a per request basis. Our use case is to add attributes to the instruments depending on what the handler is doing with the request.
See also https://github.com/open-telemetry/opentelemetry-go-contrib/pull/6092 (which is still a draft).
Could we follow the same pattern as we're doing for otelhttp? https://github.com/open-telemetry/opentelemetry-go-contrib/pull/5876/files
Hey @dmathieu
Ah I wasn't aware of #6092. Yeah we can use the same pattern, I'm happy to apply that on this PR and take over that work.
@dmathieu pushed up changes.
Had problems getting the TestStatsHandler/Recorded/ClientMetrics tests to pass locally on the rpc.client.request.size metric. I could not figure out for the life of me why all the other scoped metrics worked but this one did not. I'm still stumped by it.
@dmathieu would the ability to access the gRPCContext to be able to append to the metricAttrs also be acceptable?
We have a use case where we would only know what attributes to add to the metrics while we are in the handler after db / network calls.
I don't know if we should make metrics so permissive. By their nature, metrics must be low cardinality. Your request looks like a good way to blow them up with high cardinality attributes.
Yeah that's fair @dmathieu. How's this PR looking in general?
I'd recommend running the tests locally to ensure everything passes and help with reviews 😸
I did, but I only had issues with one test as I mentioned here: https://github.com/open-telemetry/opentelemetry-go-contrib/pull/6281#issuecomment-2444127169
Coming back at this, I don't think it's a good thing to add. Metrics should be low cardinality, and this really opens a big window for folks to shoot them in the foot with high cardinality metrics. If you wish to add request-specific data, tracing is probably what you need.
Unless there are objections, I will close this PR in 24 hours.
IMHO developers should decide if the increased cardinality is justified or not.
Maybe I'm doing it wrong but in my experience it's much cheaper to add an attribute to a metric than trace every request if you want a comprehensive view of your requests.
There's an issue and 2 PRs now trying to solve this so there is clearly demand for the capability. I've currently forked this statshandler in my code to add an extra attribute which is less than ideal. Meanwhile my HTTP services can use the otelhttp handler and add extra attributes just fine.
What attributes are you adding?
What attributes are you adding?
These are domain/service specific attributes in the context.
In the case of our monitoring platform (Datadog) we still has the ability to choose to exclude tags from being indexed, so the cardinality and cost increase can be managed quite easily. OTOH enabling tracing for every request to use trace metric is prohibitively expensive at scale, hence the preference to do this as a metric attribute.
These are domain/service specific attributes in the context.
If there are domain/service specific attributes than maybe you can create domain/service specific metrics that would use these attributes instead asking to add custom attributes to instrumentation libraries which are supposed to follow the OpenTelemetry Semantic Conventions.
There's an issue and 2 PRs now trying to solve this so there is clearly demand for the capability.
There may be other ways addressing your use case.
If there are domain/service specific attributes than maybe you can create domain/service specific metrics that would use these attributes instead asking to add custom attributes to instrumentation libraries which are supposed to follow the OpenTelemetry Semantic Conventions.
IIUC you're saying that RPC metrics must never have any attributes other than those listed in the semantic conventions?
There may be other ways addressing your use case.
Yes recommendations are welcome, I believe these current set of PRs exist because the view was that this was an acceptable solution since it was also done for the http handler.
IIUC you're saying that RPC metrics must never have any attributes other than those listed in the semantic conventions?
I do understand that there are use cases where users may want to add custom attributes. E.g. see: https://github.com/open-telemetry/opentelemetry-specification/issues/4298. However, I am not sure if these capabilities (to add explicitly add additional attributes) should be included to the instrumentation libraries. It may be an SDK capability.
I would not say must never, but I would just avoid if possible. Nothing in OpenTelemetry Specification says that instrumentation libraries should offer such capability.
Do you know if any other languages (e.g. Java, .NET, Python) allow adding custom metrics to the instrumentation libraries? As far as I remember it is not possible in .NET.
this was an acceptable solution since it was also done for the http handler.
I am not sure if this is actually good that it was added. Notice that otelhttp is experimental (not stable).
Appreciate the response @pellared
We seem to already have a few ways to add metric attributes:
OTEL_RESOURCE_ATTRIBUTESenv var (and resource detectors in general) at an SDK level- The statshandler also has
WithMetricAttributes()as an instrumentation library level
Some making a clear distinction as to why per-request metric attributes are a bridge to far vs resource-wide attributes would help the case for closing this PR and issue.
Related: https://github.com/open-telemetry/opentelemetry-specification/discussions/4311
I think it comes down to the purpose of these instrumentation libraries.
Are they meant to be the one stop shop where we can come and get the officially maintained instrumentation from OTEL? Or is it mean to be the bare minimum instrumentation that is meant to be an example of how you could do HTTP/gRPC instrumentation.
If the former then there is clearly a use case where attributes are not known until the request is handled and the SDK should provide a way to do that. Regardless of whether we shoot ourselves in the foot with cardinality. I don't think limitations in other languages should influence the feature set of another.
If the latter then that should be it should be made clear in the documentation and it's up to Platform Engineers such as myself to use this as a template to provide the instrumentation to the teams I support.
Both are fine approaches and I'm totally down for writing my own implementation to service the teams I support. Of course I would much rather use an officially maintained SDK, takes the weight off of me lol.
Are they meant to be the one stop shop where we can come and get the officially maintained instrumentation from OTEL?
This.
I want to get a full understanding (and Go SIG as well as Spec SIG agreement) on what it the scope (and constrains) of instrumentation libraries. Thus I started a discussion in OTel Specification. I want to document what are the common expectations from instrumentation libraries and consistent usage experience across instrumentation libraries in OTel Go Contrib. It should also make contributions easier.
Based on today's OTel Spec SIG meeting: Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)
Based on today's OTel Spec SIG meeting:
Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)
Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.
Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.
This
Based on today's OTel Spec SIG meeting: Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)
Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.
We discussed it during last Go SIG meeting. I am on PTO as well but once I back I plan to create an issue (or even project) about designing and standarizing an approach for instrumentation libraries' customization. Are you interested into contributing? We lack velocity at this moment to work on it more actively. If so please consider joining our next Go SIG meeting.
Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.
This
Based on today's OTel Spec SIG meeting:
Adding custom hooks for instrumentation library which are acceptable especially given they have access to contextual information that is not accessible to the processors (e.g. request payload)
Does that mean this PR is good to continue progressing with? I'm on vacation atm but I can pick it back up next week if so.
We discussed it during last Go SIG meeting. I am on PTO as well but once I back I plan to create an issue (or even project) about designing and standarizing an approach for instrumentation libraries' customization. Are you interested into contributing? We lack velocity at this moment to work on it more actively. If so please consider joining our next Go SIG meeting.
Yeah I am totally up for contributing, when is the next SIG?
Any news on this?
Any news on this?
I'm happy to work on this, just need guidance on what the next steps are.