redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

[transaction] Clear old pids from rm_stm state in overflow

Open VadimPlh opened this issue 3 years ago • 6 comments
trafficstars

Cover letter

Problem

User can use transaction/idempotency in different way. Some of them can be cause of problem. For example: user can create producer per request. Idempotency in kafka protocol works per session, the session is scoped by a producer so if user create a producer per request we will store all info for each producer in Idempotency case. It does not provide any benefits for user, but will create big maps in internal state for rm_stm. Also the same things for transaction.

We need to prevent this type overflow. So idea is add new settings (max_concurrent_producer_ids). And when rm_stm will reach this limits we will spawn clear process for log and mame state.

Fix

For Idempotency we use _log_state.seq_table, where we store session info. The simplest way to track order of last apply_data for pid. We will store seq_entry inside intrusive list to store replication order.

For transaction is a little bit harder. We can not delete any info for ongoing transaction. But we can understand do user finish transaction or not, For this we just need to check _log_state.tx_seqs. Because on abort or commit we clear this map. So for all commited/aborted txs we just can delete all info from internal maps

Fixes #5321

Backport Required

  • [x] not a bug fix
  • [ ] issue does not exist in previous branches
  • [ ] papercut/not impactful enough to backport
  • [ ] v22.2.x
  • [ ] v22.1.x
  • [ ] v21.11.x

UX changes

  • none

Release notes

  • Adding new redpanda setting (max_concurrent_producer_ids) to control how much sessions for transaction/idempotency will be saved

VadimPlh avatar Nov 02 '22 14:11 VadimPlh

I am wondering if we want to remove the oldest PIDs from all of them ?) If so we may not need to keep another index but instead use an intrusive list to keep track of an insertion order and add a timestamp to existing object this way we would iterate (according to insertion order) up to the point where PIDs are more recent than the the requested timestamp and remove all previous (or if it is just number of elements then we wouldn't even have to track timestamp).

Similar approach is used in fetch_session.h

mmaslankaprv avatar Nov 04 '22 13:11 mmaslankaprv

Can you please update the release notes section to have a better 1-liner explanation?

piyushredpanda avatar Nov 08 '22 15:11 piyushredpanda

Can you please update the release notes section to have a better 1-liner explanation?

done

VadimPlh avatar Nov 08 '22 15:11 VadimPlh

@VadimPlh : is this a user configurable setting (max_concurrent_producer_ids) -- will it need user-facing documentation?

piyushredpanda avatar Nov 09 '22 04:11 piyushredpanda

@VadimPlh : is this a user configurable setting (max_concurrent_producer_ids) -- will it need user-facing documentation?

Yes. Which person can I ping for it?

VadimPlh avatar Nov 09 '22 14:11 VadimPlh

@VadimPlh : is this a user configurable setting (max_concurrent_producer_ids) -- will it need user-facing documentation?

Yes. Which person can I ping for it?

cc: @micheleRP and @Feediver1

piyushredpanda avatar Nov 09 '22 14:11 piyushredpanda

I think we need @dotnwat's feedback on the name of the property but otherwise it looks good!

@rystsov I chimed in on that comment thread. but i wasn't thinking it was about the name, but rather the default value.

dotnwat avatar Dec 01 '22 21:12 dotnwat

Failures

  • https://github.com/redpanda-data/redpanda/issues/7548
  • hang: https://github.com/redpanda-data/redpanda/issues/7577

dotnwat avatar Dec 02 '22 19:12 dotnwat

/backport v22.3.x

VadimPlh avatar Dec 02 '22 23:12 VadimPlh