redpanda
redpanda copied to clipboard
Fix cross-shard allocator manipulation
This patch series prevents cross-shard manipulation of the allocator state which may cause segfaults and assertion failures.
The main problem is that the destructor for allocation_units maintains a pointer to the state from which it was created, which when moved onto another shard will do a call back to the original shard on destruction: a race condition. Now we use foreign pointer to avoid this: the destructor will be called on the original shard.
Fixes #5558.
Backports Required
- [ ] none - not a bug fix
- [ ] none - issue does not exist in previous branches
- [ ] none - papercut/not impactful enough to backport
- [x] v22.3.x
- [x] v22.2.x
- [ ] v22.1.x
UX Changes
Release Notes
Nice work I added myself as a reviewer just to be kept in the loop here
This is relatively clean. Any reason is draft.
ci-failure is https://github.com/redpanda-data/redpanda/issues/6870
@emaxerrno wrote:
This is relatively clean. Any reason is draft.
I just didn't have time to fill out the cover letter yet but I wanted to get a CI run in in the meantime and maybe get an eye on it.
I've marked it as ready for review now.
nit: please can you use the component: foo bar style for commit message first lines, makes it much easier to browse history later
Do we like this as a cause for https://github.com/redpanda-data/redpanda/issues/7343 ? It seems odd that the issue would manifest on upgrade if this is a long-standing bug in Redpanda.
/backport v22.3.x
/backport v22.2.x
Failed to run cherry-pick command. I executed the below command:
git cherry-pick -x c64fdbbc8d7423050d2783c0b698af17c1f03b00 0565bf67dc24a0f30fd28da770891ec64e1aab09 e5dad45b37b752ba240af4c7f06af099d56e21aa 413cbe30dd8a2c94c4025a43bc43a820789de697 44c0dc71305676eed3e66fc51f59330968bf3e2f
The attempt to backport to v22.2.x branch failed because this PR modifies src/v/bytes/oncore.h which is a file that does not currently exist on v22.2.x branch. May require backport of PR #6986 which first introduces that file. I'm not entirely sure that's the right thing to do. Plus attempt to backport failed with simple cherry pick: https://github.com/redpanda-data/redpanda/pull/6986#issuecomment-1322689361
@travisdowns @dotnwat or anybody else who knows the innards of this codebase...need advice.
/backport v22.3.x
notes from conversation with @dlex regarding backport to v22.2.x branch:
to backport 7351, the first commit in it should be dropped / or the other four should be cherry-picked. That is (apparently) an unrelated change that is causing the backport failure.
@andrewhsu i don't think you need to backport 6986. this PR makes a one line change to that oncore file, but in 22.2.x that file exists in a different location. you could even drop the change to that file entirely, just document it in the PR as a conflict.
fyi i added release notes section to the pr body; it was empty
Thanks all for the help here.
Just to clarify, the Remove const from shard_id in oncore change was related and required for this series of changes in dev, because there the shard ID in oncore object is const. It is not needed at all in 22.2.x because that member is not const there (the same series that added this header added the constness).