criu icon indicating copy to clipboard operation
criu copied to clipboard

compel: Fix ppc64le memory corruption near stack

Open ymanton opened this issue 3 years ago • 1 comments

I've been seeing intermittent crashes on ppc64le, tracked back to incorrect parasite stack setup. The ppc64le ABI allows functions to write to their caller's stack frame, so we have to allocate a dummy frame before running parasite code on the stack, otherwise function prologues may write past the end of stack memory. The s390x ABI has the same behaviour, but the code there does the right thing. Fixed the ppc64le code in the same way.

I've added a simple test case (copied & modified an existing compel test) that checks that memory after the stack buffer is not modified after the parasite is set up. (It only detects modification, not cases where the same value is written.)

Looked around existing issues; turns out #1946 is the same problem, crashes when the stack ends on a page boundary and the next page is not writable, otherwise memory is just silently corrupted.

Fixes #1946

ymanton avatar Sep 20 '22 17:09 ymanton

@ymanton thanks for the PR. Thanks for including a test.

@mihalicyn can you test this on the Jenkins systems to see if it fixes the errors you have seen in #1946

@rst0git We talked about ppc64le problems just today, does this help?

@ldu4 In case you have some input/comments?

adrianreber avatar Sep 20 '22 17:09 adrianreber

Codecov Report

Base: 71.04% // Head: 70.84% // Decreases project coverage by -0.20% :warning:

Coverage data is based on head (33e8308) compared to base (58257cb). Patch coverage: 41.66% of modified lines in pull request are covered.

:exclamation: Current head 33e8308 differs from pull request most recent head 1c1b2b9. Consider uploading reports for the commit 1c1b2b9 to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##           criu-dev    #1974      +/-   ##
============================================
- Coverage     71.04%   70.84%   -0.21%     
============================================
  Files           128      127       -1     
  Lines         32187    32765     +578     
============================================
+ Hits          22868    23213     +345     
- Misses         9319     9552     +233     
Impacted Files Coverage Δ
compel/src/lib/infect.c 54.49% <41.66%> (ø)
include/common/lock.h 85.71% <0.00%> (-3.48%) :arrow_down:
include/common/list.h 97.61% <0.00%> (-2.39%) :arrow_down:
criu/include/vma.h 100.00% <0.00%> (ø)
criu/include/parasite.h 100.00% <0.00%> (ø)
compel/arch/x86/src/lib/infect.c
criu/arch/x86/cpu.c
criu/include/pid.h
compel/arch/x86/src/lib/cpu.c 78.21% <0.00%> (ø)
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Sep 27 '22 15:09 codecov-commenter

We talked about ppc64le problems just today, does this help?

The problem we have seen with ppc64le looks unrelated to this PR.

rst0git avatar Sep 27 '22 18:09 rst0git