Increase metrics bucketing efficiency
Since https://github.com/getsentry/relay/issues/2083, we skip metrics extraction for indexed transactions in PoPs, and extract those metrics in processing relays instead.
This deteriorated our bucketing efficiency, because metrics extracted by PoPs are routed to processing relays by bucket key, but the transaction payloads themselves are routed round-robin.
Possible solutions to improve bucketing efficiency again (non-exhaustive):
- Route transactions by a subset of its corresponding metrics' bucket key.
- Make sure that metrics extracted by PoP-Relays pass through the same routing as buckets emitted by PoPs, by sending them over the network instead of inserting them into the local aggregator directly. 2.a. This could be accomplished by separating the concerns of processing and metrics aggregation into two different relay pools (as we did for project configs). See https://github.com/getsentry/team-ingest/issues/139.
- ...
Increase of number of buckets on the transaction metrics topic was +70%
- Additional routing + aggregation layer brings latency -> don't want that
- Routing transactions might help, but metrics extracted in PoPs would still go to a different processing relay, and metrics with fewer tags than the common set (e.g.
usage) would not benefit from this.
Additional routing + aggregation layer brings latency -> don't want that
We thought this through: It would not actually bring additional latency compared to metrics that are extraced in PoPs. In both cases the time frame is roughly the same (PoP Aggregator -> Processing Aggregator vs Processing Aggregator -> Processing Aggregator).
We should re-visit this if it becomes a problem for either Kafka (+ consumers) or Relay itself.