snmalloc icon indicating copy to clipboard operation
snmalloc copied to clipboard

CI breakage: ppc64le-linux cross-emulation environment no longer works

Open nwf-msr opened this issue 3 years ago • 3 comments

Presumably because the ubuntu-latest runners have stepped forward (as with #575), the powerpc64el cross-build and -run test is failing an apparently random subset of the tests. I am unable to reproduce these crashes on my Power machine, so I am inclined to think it's an artifact of emulation. With a little elbow grease and prodding, I can reproduce it on WSL2. It looks like we aren't making it very far into program startup... with qemu tracing its heart out (and some judicious editing of the resulting 60MB log), we see that the signal is

--- SIGSEGV {si_signo=SIGSEGV, si_code=1, si_addr=0x0000004001be5000} ---

and the program counter is presumably near the last TB we entered, which was

exec_tb tb:[...] pc=0x4001e2f724

si_code=1 is SEGV_MAPERR ("address not mapped to object").

The fault address 0x4001be5000 is within the dynamic linker's load of libsdtdc++

openat(AT_FDCWD,"/usr/powerpc64le-linux-gnu/lib/libstdc++.so.6",O_RDONLY|O_CLOEXEC) = 4
[...]
mmap(0x0000004001bd0000,131072,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_DENYWRITE|MAP_FIXED,4,0x2f0000) = 0x4001bd0000

and is not in the range of any subsequent mprotect call. The PC 0x4001e2f724 is within libc:

openat(AT_FDCWD,"/usr/powerpc64le-linux-gnu/lib/libc.so.6",O_RDONLY|O_CLOEXEC) = 4
[...]
mmap(NULL,2417600,PROT_EXEC|PROT_READ,MAP_PRIVATE|MAP_DENYWRITE,4,0) = 0x4001d60000

That trace certainly suggests that there should be memory at 0x4001be5000, I think.

qemu v4.2.1, approximately what shipped in Ubuntu Focal, lets the test pass. I'll bisect qemu and report back.

nwf-msr avatar Dec 11 '22 06:12 nwf-msr

Bisection points at this being the fault of https://github.com/qemu/qemu/commit/4dcf078f094d436866ef793aa25c96fba85ac8d0 . The first release to contain that commit was v5.0.0, putting it after Ununtu Focal and before Impish (and so Jammy). I don't understand why that change would trigger this behavior, but so it goes.

For history, I used this somewhat awkward command to build, since qemu has changed their build system and output layout a few times in the large span between v4.2.1 and today:

(rm -rf _build; mkdir _build; cd _build; ../configure --target-list=ppc64le-linux-user --disable-werror --disable-docs; ninja || make -j5; ln -s ppc64le-linux-user/qemu-ppc64le .)

nwf-msr avatar Dec 11 '22 06:12 nwf-msr

Reported to qemu at https://gitlab.com/qemu-project/qemu/-/issues/1361

nwf-msr avatar Dec 11 '22 07:12 nwf-msr