feat(propdefs): implement batch APIs on update cache and managing synchronization directly
Problem
We'd like to explore ways to insert and remove batches of Update entries from the pod-local cache without causing lock contention across producer and (many) consumer worker threads.
If this was unblocked, we could refactor the property-defs-rs cache strategy to insert Updates only when successfully persisted to Postgres, which will improve cache accuracy and allow similar events to be reprocessed when batch writes fail.
Changes
Since we always have a full batch to insert or delete from the cache, we can experiment with inserting N Updates at a time for each lock taken rather than per-Update. If N is tuned properly, this could help speed up bulk insertion/removal and reduce lock contention.
This PR implements the experiment, but will land as a no-op; the new batch APIs are not wired up in the event codepath in this PR.
Did you write or update any docs for this change?
- [ ] I've added or updated the docs
- [ ] I've reached out for help from the docs team
- [x] No docs needed for this change
How did you test this code?
Locally and in CI
👋 After a bit more digging, I'm going to do a round of refactoring and/or pause on this experiment.
I think the safest way to test the batch insert/removals out is to break this out into it's own cache impl and introduce it conditionally so we can swap it in and out quickly in deploy testing, while falling back on the current implementation when performing single-entry inserts/removes.
The reason I'd like to retain the current cache impl for the single-update path is, the underlying quick_cache::sync::Caches we wrap right now spreads entries across a set of internal shards, each with their own lock, to decrease contention.
The new batch implementation of the cache in this PR is forced to use one "global" lock to gate access to all internal shards of each cache, when making single or batch update. I suspect this will act as a bottleneck on the single-update APIs which are on the hot path in the service right now.
I'll update the PR today and ping for re-review when I'm done 👍
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.