add ability to cleanup dedup metadata in realtime
This is a new table config deduplication feature to allow near realtime cleanup of metadata keys. When realtimeTTLCleanupIntervalSeconds > 0, we start a background thread per partition to call removeExpiredPrimaryKeys on every key that is outside of the metadataTTL.
The use case for this feature is when metadataTTL << segment flush threshold. With high throughput topics, we may want just a few minutes of deduplication to handle transient, upstream duplicates, but we do not want the memory overhead of hours of keys. It is safe to repeatedly call removeExpiredPrimaryKeys because deduplication in the ingestion path already respects the TTL even if the key is already in the dictionary.
The goal is for this feature to be backwards compatible. It adds a new interface method for starting the manager after initialization. The only possible and unlikely edge case is if someone overrides BasePartitionDedupMetadataManager.stop and uses this feature and does not correctly stop the async cleanup. But even then, it should be safe to keep running.
We tested this over 12 hours publishing ~250 random events per second with a 10 minute metadataTTL and cleanup every 1 minute. We then republished those same events 3-5 minutes later. We then ran count(*) repeatedly to show it was < .01% behind the number of unique events published
Monitoring shows a few things:
- the primary key count is capped
- we see rows dropped increase after a few minutes from republishing starting
- we see expired key removal starting after ~10-11 minutes when keys first hit the TTL