relay Increase metrics bucketing efficiency

Since https://github.com/getsentry/relay/issues/2083, we skip metrics extraction for indexed transactions in PoPs, and extract those metrics in processing relays instead.

This deteriorated our bucketing efficiency, because metrics extracted by PoPs are routed to processing relays by bucket key, but the transaction payloads themselves are routed round-robin.

Possible solutions to improve bucketing efficiency again (non-exhaustive):

Route transactions by a subset of its corresponding metrics' bucket key.
Make sure that metrics extracted by PoP-Relays pass through the same routing as buckets emitted by PoPs, by sending them over the network instead of inserting them into the local aggregator directly. 2.a. This could be accomplished by separating the concerns of processing and metrics aggregation into two different relay pools (as we did for project configs). See https://github.com/getsentry/team-ingest/issues/139.
...

Oct 30 '23 13:10 jjbayer

Increase of number of buckets on the transaction metrics topic was +70%

Nov 09 '23 14:11 jjbayer

Additional routing + aggregation layer brings latency -> don't want that
Routing transactions might help, but metrics extracted in PoPs would still go to a different processing relay, and metrics with fewer tags than the common set (e.g. usage) would not benefit from this.

Jan 24 '24 12:01 jjbayer

Additional routing + aggregation layer brings latency -> don't want that

We thought this through: It would not actually bring additional latency compared to metrics that are extraced in PoPs. In both cases the time frame is roughly the same (PoP Aggregator -> Processing Aggregator vs Processing Aggregator -> Processing Aggregator).

Jan 29 '24 11:01 jjbayer

We should re-visit this if it becomes a problem for either Kafka (+ consumers) or Relay itself.

Dec 11 '24 13:12 Dav1dde