How is the cardinality limit applied when attributes are being filtered?
When a view is applied to a metric pipeline that contains an attribute filter, how should the cardinality limit be applied? Prior to filtering attributes, or post?
The answer we come up with will affect the output of user data. For example, if measurements for the following attributes are made:
-
{path: "/", code: 200, query: ""} -
{path: "/", code: 400, query: "user=bob"} -
{path: "/admin", code: 200, query: ""}
If an attribute filter is applied so that only the path attribute is retained and cardinality limit of 3 is set, if the filtering is applied prior to checking the cardinality limit the following attributes will be kept on the output metric streams:
-
{path: "/"} -
{path: "admin"}
However, if the cardinality limit is applied prior to filtering the following attributes will be kept on the output metric streams:
-
{path: "/"} -
{otel.metric.overflow: true}
Filter before limiting
Given the cardinality limit feature was added to limit the number of resources and SDK uses during measurements, if filtering is to be applied prior to the cardinality limit being applied it will need to be done in the "hot path" of making a measurement.
Pros:
- The correct number of attributes (<= cardinality limit) are ensured to be on the output metric streams
Cons:
- Requires filtering for every measurement made
Limit before filtering
Limiting without doing any filtering means the filtering process can be delayed to the collection of metric streams. This is a substantial performance issue given the filtering process will need to run at most M times (for M being the number of distinct attributes recorded) rather than N times (for N being the number of measurements made).
Pros:
- Performance on the "hot path" is not impacted by filtering
Cons
- It is possible that there will be less than the cardinality limit many distinct attributes exported on the output metric streams (possibly even just
otel.metric.overflowattribute in some cases)
The limit is intended to be against the amount of in memory storage you have. Given that, I believe the filtering MUST be done on your hot-path. The intention is to avoid Denial-of-Service attacks via high cardinality inputs that impact metrics (as many users will attach attributes that come from requests, even our semconv recommend this).
I don't think we have any rules about forcing filtering to be in the hot path or afterwards.
To me this means you have to limit before you hit any in-memory storage. Whether this means, in Go, you start doing filtering in the hot path, is up to you. I'd argue that the con of having less cardinality than your limit in some cases is better than the alternative of running out of memory :)
Given that, I believe the filtering MUST be done on your hot-path.
@jsuereth did you mean "Given that, I believe the [limiting] MUST be done on your hot-path"? Not necessarily the filtering, right?
To me this means you have to limit before you hit any in-memory storage. Whether this means, in Go, you start doing filtering in the hot path, is up to you. I'd argue that the con of having less cardinality than your limit in some cases is better than the alternative of running out of memory :)
:+1:
@jsuereth did you mean "Given that, I believe the [limiting] MUST be done on your hot-path"? Not necessarily the filtering, right?
Yep! Limiting MUST be on hot path, you can pick how you filter for best UX + performance.
@jsuereth what are you thoughts on potential inconsistencies across languages here?
If one implementation filters prior to the limiting and another filters afterwards they could produce different outputs. Is consistency here going to be an issue?
If one implementation filters prior to the limiting and another filters afterwards they could produce different outputs. Is consistency here going to be an issue?
I think regarding cardinality limits we're talking about error scenarios and worst-case behavior. We already have a lot of inconssitencies in how failures are handled due to runtime limitations. We try to be consistent, but when it comes to extraordinary/error scenarios, I think some inconsistencies between SDKs is ok.
Another way to phrase it -> I think users, if given a choice, would prefer lower o11y overhead per-language over perfect consistency.
Java checks whether the cardinality exceeds the limit after attribute filtering has occurred. I think this is the right behavior because its the least surprising to users, e.g. its surprising if I set my cardinality limit to 100 but only see 20 series. Both the attribute filter and cardinality limit are mechanisms to manage cardinality. The should work together and not be at odds with each other.
Java checks whether the cardinality exceeds the limit after attribute filtering has occurred. I think this is the right behavior because its the least surprising to users, e.g. its surprising if I set my cardinality limit to 100 but only see 20 series. Both the attribute filter and cardinality limit are mechanisms to manage cardinality. The should work together and not be at odds with each other.
Doesn't this mean that Java needs to filter every measurement the user makes though? And do so in the "hot path" of telemetry recording?
Doesn't this mean that Java needs to filter every measurement the user makes though? And do so in the "hot path" of telemetry recording?
Yes.
I've clarified the SDK cardinality limit in #3856. The spec now says "For a given metric, the cardinality limit is a hard limit on the number of metric points that can be collected during a collection cycle". I also sent another editorial PR to clean up metric points #3906.
@MrAlias Do you think this issue can be marked as resolved? (I've provided more info regarding how I want to address a set of problems in #3866).
@MrAlias is this complete?