feat: Add revised partition stats tables
Problem
We often want more information about the events in a Kafka topic for diagnostic purposes, particularly during incidents. There is currently no good way to answer questions like:
- Has a particularly token recently sent a large volume of events?
- For a particular token, is one distinct ID more prevalent than others over some timestamp or offset range?
- In the case of an accumulating backlog, is one token, distinct ID, or event type overrepresented within a partition?
events_plugin_ingestion_partition_statistics has some of this data, but its use is inconsistent across clusters.
events data is not well-suited to answer these questions as it:
- only contains data that has made it through the entire ingestion pipeline (often we want to see what data is between the current ingestion consumer offset and the latest offset for a partition to identify impact of data drops, for example),
- does not contain any record of the originating topic or partition for the message (just the offset), so even retroactively identifying what messages were within a partition during an incident is difficult.
Changes
Adds a new table (events_plugin_ingestion_partition_statistics_v2) that tracks several of the more relevant event characteristics.
This allows us to run ad-hoc queries such as this example to find heavy hitter token and distinct ID pairs within an offset range:
SELECT
token,
distinct_id,
count()
FROM events_plugin_ingestion_partition_statistics_v2
WHERE (topic = 'events_plugin_ingestion') AND (partition = 0) AND (offset >= 10)
GROUP BY ALL
ORDER BY count() DESC
LIMIT 5
Does this work well for both Cloud and self-hosted?
It should.
How did you test this code?
See snapshots, also ran some data through functional tests just to verify things were generally working as expected.
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.