relay Respect project ID in cardinality limiter

Currently, the hash of a metrics bucket for cardinality limiting purposes is computed only from the MRI and the tags:

https://github.com/getsentry/relay/blob/c892dd9c22bcd5147c9135e80341be03fbc0e29e/relay-metrics/src/bucket.rs#L725-L730

But it makes a difference for both the number of kafka buckets and the number of clickhouse rows whether a tag combination for a metric exists in a single project or in 100 different ones. In other words, we could see the project_id as a special tag that happens to have been promoted to a separate field.

Solution 1: Do nothing. The cardinality limiter only tracks dimensions that can grow dynamically, and the number of project IDs is relatively static. Solution 2: Include the project ID in the cardinality hash.

Note 1: If we ever have cross-org cardinality limits, we should include the org ID as well. Note 2: The same could be said about the bucket timestamp (reoccurring buckets are worse than one-off buckets), but I'm not sure whether including it makes sense, since time is a special dimension in the cardinality limiter.

Jan 25 '24 12:01 jjbayer

Note 1: If we ever have cross-org cardinality limits, we should include the org ID as well.

If we decide to include the project id in the cardinality hash, we should also always include the org id. Basically include the entire relay_cardinality::Scoping in the hash.

I think overall that makes sense, but we shouldn't make the change until we have some confidence in the Relay cardinality limiter, since this hurts the comparability with the Python cardinality limiter and we would have to redefine our limits for this change.

Jan 25 '24 12:01 Dav1dde

This is obsolete/implemented, the cardinality limiter can be scoped dynamically to global, org, project, per_metric levels.

Dec 10 '24 13:12 Dav1dde