redpanda
redpanda copied to clipboard
Fix cross-shard allocator manipulation
This patch series prevents cross-shard manipulation of the allocator state which may cause segfaults and assertion failures.
The main problem is that the destructor for allocation_units maintains a pointer to the state from which it was created, which when moved onto another shard will do a call back to the original shard on destruction: a race condition. Now we use foreign pointer to avoid this: the destructor will be called on the original shard.
Fixes #5558.
Backports Required
- [ ] none - not a bug fix
- [ ] none - issue does not exist in previous branches
- [ ] none - papercut/not impactful enough to backport
- [x] v22.3.x
- [x] v22.2.x
- [ ] v22.1.x
UX Changes
Release Notes
Nice work I added myself as a reviewer just to be kept in the loop here
This is relatively clean. Any reason is draft.
ci-failure is https://github.com/redpanda-data/redpanda/issues/6870
@emaxerrno wrote:
This is relatively clean. Any reason is draft.
I just didn't have time to fill out the cover letter yet but I wanted to get a CI run in in the meantime and maybe get an eye on it.
I've marked it as ready for review now.
nit: please can you use the component: foo bar
style for commit message first lines, makes it much easier to browse history later
Do we like this as a cause for https://github.com/redpanda-data/redpanda/issues/7343 ? It seems odd that the issue would manifest on upgrade if this is a long-standing bug in Redpanda.
/backport v22.3.x
/backport v22.2.x
Failed to run cherry-pick command. I executed the below command:
git cherry-pick -x c64fdbbc8d7423050d2783c0b698af17c1f03b00 0565bf67dc24a0f30fd28da770891ec64e1aab09 e5dad45b37b752ba240af4c7f06af099d56e21aa 413cbe30dd8a2c94c4025a43bc43a820789de697 44c0dc71305676eed3e66fc51f59330968bf3e2f
The attempt to backport to v22.2.x branch failed because this PR modifies src/v/bytes/oncore.h
which is a file that does not currently exist on v22.2.x branch. May require backport of PR #6986 which first introduces that file. I'm not entirely sure that's the right thing to do. Plus attempt to backport failed with simple cherry pick: https://github.com/redpanda-data/redpanda/pull/6986#issuecomment-1322689361
@travisdowns @dotnwat or anybody else who knows the innards of this codebase...need advice.
/backport v22.3.x
notes from conversation with @dlex regarding backport to v22.2.x branch:
to backport 7351, the first commit in it should be dropped / or the other four should be cherry-picked. That is (apparently) an unrelated change that is causing the backport failure.
@andrewhsu i don't think you need to backport 6986. this PR makes a one line change to that oncore file, but in 22.2.x that file exists in a different location. you could even drop the change to that file entirely, just document it in the PR as a conflict.
fyi i added release notes section to the pr body; it was empty
Thanks all for the help here.
Just to clarify, the Remove const from shard_id in oncore change was related and required for this series of changes in dev
, because there the shard ID in oncore
object is const
. It is not needed at all in 22.2.x because that member is not const
there (the same series that added this header added the constness).