Cardinality of grpc_target attribute for grpc.client.* metrics is too large
What version of gRPC-Java are you using?
1.71.0
What is your environment?
OS: Linux JDK: 21
What did you expect to see?
No warnings when enabling default OpenTelemetry metrics.
What did you see instead?
WARNING: Instrument grpc.client.attempt.started has exceeded the maximum allowed cardinality (1999).
WARNING: Instrument grpc.client.attempt.rcvd_total_compressed_message_size has exceeded the maximum allowed cardinality (1999).
WARNING: Instrument grpc.client.attempt.sent_total_compressed_message_size has exceeded the maximum allowed cardinality (1999).
WARNING: Instrument grpc.client.call.duration has exceeded the maximum allowed cardinality (1999).
WARNING: Instrument grpc.client.attempt.duration has exceeded the maximum allowed cardinality (1999).
This is caused by the grpc_target attribute having a very large cardinality.
I see that this attribute is listed for these metrics at https://github.com/grpc/proposal/blob/master/A66-otel-stats.md. However, for very large deployments this can be problematic. Is there a way to turn this attribute off?
Steps to reproduce the bug
Have a client which connects to 2000+ servers (either at once or over time).
This was explicitly not part of the gRFC (and was follow-up work potentially):
It is possible for some channels to use IP addresses as target strings and this might again blow up the cardinality. In the future, we can consider adding the ability to override recorded target names to avoid this.
But I'm seeing that C++ apparently has an API for this.
I knew they were going to allow filtering the method name, but Go and Java didn't need that (we have generated code opt-in to exposing the method name). It seems this approach was then copied to target string. I don't know why it was explicitly excluded from the gRFC but included in C++.
I'm suspicious that there's a way to configure OpenTelemetry to handle this.
Yash pointed me to https://github.com/grpc/proposal/pull/431. So it was removed from gRFC A66 after it was initially merged.
@t-peoples, can you confirm that you're using IP addresses in your target strings, and that's why the cardinality is so high?
@ejona86 yes, we are using IP addresses in the target strings. This system requires broadcasting across many nodes, so we'd have the same problem if using DNS names, for what it's worth.
I talked to the other language leads, and we agreed we could just do the target filter approach. It'd need a small gRFC and someone to implement it in Java, but is relatively straight-forward. (There could be alternative approaches, but that works, is easy, and solves a pressing need. So don't complicate things.)
FYI: it may be possible to work around this a bit by configuring a View with setAttributeFilter() to discard grpc.target. That's not really a replacement for the target filter, as it is all-or-nothing, but it may provide a stop-gap.
@ejona86
Based on the discussion above, I’m thinking of taking this on.
The approach I have in mind is to align with the target attribute filtering model used in C++:
- add an optional
setTargetAttributeFilter(Predicate<String>)toGrpcOpenTelemetry.Builder - apply the filter in
OpenTelemetryMetricsModule, mapping targets rejected by the filter to"other" - keep the default behavior unchanged when no filter is provided
- add tests to cover the behavior
- write a small gRFC to document the change
Does this sound reasonable before I start implementing it?
@becomeStar, that sounds fine.
Thanks for the confirmation. I’ll start working on it.