opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

How is the cardinality limit applied when attributes are being filtered?

Open MrAlias opened this issue 2 years ago • 11 comments

When a view is applied to a metric pipeline that contains an attribute filter, how should the cardinality limit be applied? Prior to filtering attributes, or post?

The answer we come up with will affect the output of user data. For example, if measurements for the following attributes are made:

  1. {path: "/", code: 200, query: ""}
  2. {path: "/", code: 400, query: "user=bob"}
  3. {path: "/admin", code: 200, query: ""}

If an attribute filter is applied so that only the path attribute is retained and cardinality limit of 3 is set, if the filtering is applied prior to checking the cardinality limit the following attributes will be kept on the output metric streams:

  • {path: "/"}
  • {path: "admin"}

However, if the cardinality limit is applied prior to filtering the following attributes will be kept on the output metric streams:

  • {path: "/"}
  • {otel.metric.overflow: true}

Filter before limiting

Given the cardinality limit feature was added to limit the number of resources and SDK uses during measurements, if filtering is to be applied prior to the cardinality limit being applied it will need to be done in the "hot path" of making a measurement.

Pros:

  • The correct number of attributes (<= cardinality limit) are ensured to be on the output metric streams

Cons:

  • Requires filtering for every measurement made

Limit before filtering

Limiting without doing any filtering means the filtering process can be delayed to the collection of metric streams. This is a substantial performance issue given the filtering process will need to run at most M times (for M being the number of distinct attributes recorded) rather than N times (for N being the number of measurements made).

Pros:

  • Performance on the "hot path" is not impacted by filtering

Cons

  • It is possible that there will be less than the cardinality limit many distinct attributes exported on the output metric streams (possibly even just otel.metric.overflow attribute in some cases)

MrAlias avatar Dec 19 '23 21:12 MrAlias

The limit is intended to be against the amount of in memory storage you have. Given that, I believe the filtering MUST be done on your hot-path. The intention is to avoid Denial-of-Service attacks via high cardinality inputs that impact metrics (as many users will attach attributes that come from requests, even our semconv recommend this).

I don't think we have any rules about forcing filtering to be in the hot path or afterwards.

To me this means you have to limit before you hit any in-memory storage. Whether this means, in Go, you start doing filtering in the hot path, is up to you. I'd argue that the con of having less cardinality than your limit in some cases is better than the alternative of running out of memory :)

jsuereth avatar Dec 20 '23 15:12 jsuereth

Given that, I believe the filtering MUST be done on your hot-path.

@jsuereth did you mean "Given that, I believe the [limiting] MUST be done on your hot-path"? Not necessarily the filtering, right?

MrAlias avatar Dec 20 '23 15:12 MrAlias

To me this means you have to limit before you hit any in-memory storage. Whether this means, in Go, you start doing filtering in the hot path, is up to you. I'd argue that the con of having less cardinality than your limit in some cases is better than the alternative of running out of memory :)

:+1:

MrAlias avatar Dec 20 '23 15:12 MrAlias

@jsuereth did you mean "Given that, I believe the [limiting] MUST be done on your hot-path"? Not necessarily the filtering, right?

Yep! Limiting MUST be on hot path, you can pick how you filter for best UX + performance.

jsuereth avatar Dec 20 '23 15:12 jsuereth

@jsuereth what are you thoughts on potential inconsistencies across languages here?

If one implementation filters prior to the limiting and another filters afterwards they could produce different outputs. Is consistency here going to be an issue?

MrAlias avatar Dec 20 '23 16:12 MrAlias

If one implementation filters prior to the limiting and another filters afterwards they could produce different outputs. Is consistency here going to be an issue?

I think regarding cardinality limits we're talking about error scenarios and worst-case behavior. We already have a lot of inconssitencies in how failures are handled due to runtime limitations. We try to be consistent, but when it comes to extraordinary/error scenarios, I think some inconsistencies between SDKs is ok.

jsuereth avatar Dec 20 '23 16:12 jsuereth

Another way to phrase it -> I think users, if given a choice, would prefer lower o11y overhead per-language over perfect consistency.

jsuereth avatar Dec 20 '23 16:12 jsuereth

Java checks whether the cardinality exceeds the limit after attribute filtering has occurred. I think this is the right behavior because its the least surprising to users, e.g. its surprising if I set my cardinality limit to 100 but only see 20 series. Both the attribute filter and cardinality limit are mechanisms to manage cardinality. The should work together and not be at odds with each other.

jack-berg avatar Dec 20 '23 18:12 jack-berg

Java checks whether the cardinality exceeds the limit after attribute filtering has occurred. I think this is the right behavior because its the least surprising to users, e.g. its surprising if I set my cardinality limit to 100 but only see 20 series. Both the attribute filter and cardinality limit are mechanisms to manage cardinality. The should work together and not be at odds with each other.

Doesn't this mean that Java needs to filter every measurement the user makes though? And do so in the "hot path" of telemetry recording?

MrAlias avatar Dec 21 '23 00:12 MrAlias

Doesn't this mean that Java needs to filter every measurement the user makes though? And do so in the "hot path" of telemetry recording?

Yes.

jack-berg avatar Dec 21 '23 00:12 jack-berg

I've clarified the SDK cardinality limit in #3856. The spec now says "For a given metric, the cardinality limit is a hard limit on the number of metric points that can be collected during a collection cycle". I also sent another editorial PR to clean up metric points #3906.

@MrAlias Do you think this issue can be marked as resolved? (I've provided more info regarding how I want to address a set of problems in #3866).

reyang avatar Feb 27 '24 00:02 reyang

@MrAlias is this complete?

austinlparker avatar Apr 30 '24 20:04 austinlparker