redpanda
redpanda copied to clipboard
[transaction] Clear old pids from rm_stm state in overflow
Cover letter
Problem
User can use transaction/idempotency in different way. Some of them can be cause of problem. For example: user can create producer per request. Idempotency in kafka protocol works per session, the session is scoped by a producer so if user create a producer per request we will store all info for each producer in Idempotency case. It does not provide any benefits for user, but will create big maps in internal state for rm_stm. Also the same things for transaction.
We need to prevent this type overflow. So idea is add new settings (max_concurrent_producer_ids). And when rm_stm will reach this limits we will spawn clear process for log and mame state.
Fix
For Idempotency we use _log_state.seq_table, where we store session info. The simplest way to track order of last apply_data for pid. We will store seq_entry inside intrusive list to store replication order.
For transaction is a little bit harder. We can not delete any info for ongoing transaction. But we can understand do user finish transaction or not, For this we just need to check _log_state.tx_seqs. Because on abort or commit we clear this map. So for all commited/aborted txs we just can delete all info from internal maps
Fixes #5321
Backport Required
- [x] not a bug fix
- [ ] issue does not exist in previous branches
- [ ] papercut/not impactful enough to backport
- [ ] v22.2.x
- [ ] v22.1.x
- [ ] v21.11.x
UX changes
- none
Release notes
- Adding new redpanda setting (
max_concurrent_producer_ids) to control how much sessions for transaction/idempotency will be saved
I am wondering if we want to remove the oldest PIDs from all of them ?) If so we may not need to keep another index but instead use an intrusive list to keep track of an insertion order and add a timestamp to existing object this way we would iterate (according to insertion order) up to the point where PIDs are more recent than the the requested timestamp and remove all previous (or if it is just number of elements then we wouldn't even have to track timestamp).
Similar approach is used in fetch_session.h
Can you please update the release notes section to have a better 1-liner explanation?
Can you please update the release notes section to have a better 1-liner explanation?
done
@VadimPlh : is this a user configurable setting (max_concurrent_producer_ids) -- will it need user-facing documentation?
@VadimPlh : is this a user configurable setting (max_concurrent_producer_ids) -- will it need user-facing documentation?
Yes. Which person can I ping for it?
@VadimPlh : is this a user configurable setting (max_concurrent_producer_ids) -- will it need user-facing documentation?
Yes. Which person can I ping for it?
cc: @micheleRP and @Feediver1
I think we need @dotnwat's feedback on the name of the property but otherwise it looks good!
@rystsov I chimed in on that comment thread. but i wasn't thinking it was about the name, but rather the default value.
Failures
- https://github.com/redpanda-data/redpanda/issues/7548
- hang: https://github.com/redpanda-data/redpanda/issues/7577
/backport v22.3.x