redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Fix cross-shard allocator manipulation

Open travisdowns opened this issue 2 years ago • 4 comments

This patch series prevents cross-shard manipulation of the allocator state which may cause segfaults and assertion failures.

The main problem is that the destructor for allocation_units maintains a pointer to the state from which it was created, which when moved onto another shard will do a call back to the original shard on destruction: a race condition. Now we use foreign pointer to avoid this: the destructor will be called on the original shard.

Fixes #5558.

Backports Required

  • [ ] none - not a bug fix
  • [ ] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [x] v22.3.x
  • [x] v22.2.x
  • [ ] v22.1.x

UX Changes

Release Notes

travisdowns avatar Nov 17 '22 09:11 travisdowns

Nice work I added myself as a reviewer just to be kept in the loop here

graphcareful avatar Nov 17 '22 14:11 graphcareful

This is relatively clean. Any reason is draft.

emaxerrno avatar Nov 17 '22 16:11 emaxerrno

ci-failure is https://github.com/redpanda-data/redpanda/issues/6870

travisdowns avatar Nov 17 '22 22:11 travisdowns

@emaxerrno wrote:

This is relatively clean. Any reason is draft.

I just didn't have time to fill out the cover letter yet but I wanted to get a CI run in in the meantime and maybe get an eye on it.

I've marked it as ready for review now.

travisdowns avatar Nov 17 '22 22:11 travisdowns

nit: please can you use the component: foo bar style for commit message first lines, makes it much easier to browse history later

jcsp avatar Nov 21 '22 10:11 jcsp

Do we like this as a cause for https://github.com/redpanda-data/redpanda/issues/7343 ? It seems odd that the issue would manifest on upgrade if this is a long-standing bug in Redpanda.

jcsp avatar Nov 21 '22 10:11 jcsp

/backport v22.3.x

piyushredpanda avatar Nov 21 '22 17:11 piyushredpanda

/backport v22.2.x

piyushredpanda avatar Nov 21 '22 17:11 piyushredpanda

Oops! Something went wrong.

Workflow run logs.

vbotbuildovich avatar Nov 21 '22 17:11 vbotbuildovich

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x c64fdbbc8d7423050d2783c0b698af17c1f03b00 0565bf67dc24a0f30fd28da770891ec64e1aab09 e5dad45b37b752ba240af4c7f06af099d56e21aa 413cbe30dd8a2c94c4025a43bc43a820789de697 44c0dc71305676eed3e66fc51f59330968bf3e2f

Workflow run logs.

vbotbuildovich avatar Nov 21 '22 17:11 vbotbuildovich

The attempt to backport to v22.2.x branch failed because this PR modifies src/v/bytes/oncore.h which is a file that does not currently exist on v22.2.x branch. May require backport of PR #6986 which first introduces that file. I'm not entirely sure that's the right thing to do. Plus attempt to backport failed with simple cherry pick: https://github.com/redpanda-data/redpanda/pull/6986#issuecomment-1322689361

@travisdowns @dotnwat or anybody else who knows the innards of this codebase...need advice.

andrewhsu avatar Nov 21 '22 21:11 andrewhsu

/backport v22.3.x

graphcareful avatar Nov 21 '22 22:11 graphcareful

notes from conversation with @dlex regarding backport to v22.2.x branch:

to backport 7351, the first commit in it should be dropped / or the other four should be cherry-picked. That is (apparently) an unrelated change that is causing the backport failure.

andrewhsu avatar Nov 21 '22 22:11 andrewhsu

@andrewhsu i don't think you need to backport 6986. this PR makes a one line change to that oncore file, but in 22.2.x that file exists in a different location. you could even drop the change to that file entirely, just document it in the PR as a conflict.

dotnwat avatar Nov 23 '22 05:11 dotnwat

fyi i added release notes section to the pr body; it was empty

andrewhsu avatar Nov 23 '22 14:11 andrewhsu

Thanks all for the help here.

Just to clarify, the Remove const from shard_id in oncore change was related and required for this series of changes in dev, because there the shard ID in oncore object is const. It is not needed at all in 22.2.x because that member is not const there (the same series that added this header added the constness).

travisdowns avatar Nov 23 '22 21:11 travisdowns