self
self copied to clipboard
Sparc: VM miscopiled with `-O2`
reldbg build defaults to -O2
and on NetBSD/sparc with gcc (nb2 20230710) 10.5.0 the VM gets SIGBUS
when trying to load the world:
#0 Conversion::convertVFrames (this=0x6ebfc8) at vm/src/any/runtime/conversion.cpp:176
#1 0x000c75f8 in Conversion::convert (this=0x6ebfc8) at vm/src/any/runtime/conversion.cpp:66
#2 0x000c8578 in Conversion::doit (this=0x6ebfc8) at vm/src/any/runtime/conversion.cpp:14
#3 0x000e6208 in switchToVMStack (continuation=0xc9380 <ConvertFrame_cont()>) at vm/src/any/runtime/process.cpp:1800
#4 0x000cbd14 in ConvertFrame (isInterp=false, nlrHomeID=14, nlrHome=0xe7ffe030, nlr=true, sp=0xe7ffe030, result=0x40e5785) at vm/src/any/runtime/frame.cpp:518
#5 HandleReturnTrap (result=0x40e5785, sp_of_patched_frame=0xe7ffe030, nlr=<optimized out>, nlrHome=0xe7ffe030, nlrHomeID=14) at vm/src/any/runtime/frame.cpp:588
#6 0x00175954 in ReturnTrapNLR_returnPC ()
Telling reldbg to use more conservative -Og
results in the VM that seems to work ok and passes the tests.
Unfortunately, I currently don't have time to debug this further or to binary-search for the -f
optimization that is not in -Og
but is in -O2
that triggers this.
To narrow it down a bit, -O1
is ok, -Os
fails.
So the failure being related to frames was a broad hint and, indeed, -fno-optimize-sibling-calls
helps. But the time to load the world and to do --runAutomaticTests
is significantly worse for -Os -fno-optimize-sibling-calls
than for -O1
.
I wonder how much the higher -O levels above -O1 are actually buying us.
I'm having issues getting NetBSD Sparc to run on Qemu (I don't have Sparc hardware anymore) but I'll have a look at this as soon as I get it running.
Beware that sparc needs a few tweaks, that I think I mentioned in the PR
- a local copy of
.mul
invm/src/sparc/prims/asmPrims_sparc.S
b/c v8 stub on NetBSD doesn't follow the.mul
ABI (#152) that nothing in the gcc generated code relies on, but Self does. I should probably do a PR that provides one for the very unlikely case that someone runs it on v7 and do a v8 multiplication otherwise - workaround for #149 - for which I currently use a version of libc compiled with phk malloc instead of jemalloc (
USE_JEMALLOC=no
) with an additional implementation of__je_sallocx
used bytrue_size_of_malloced_obj
invm/src/any/runtime/allocation.cpp
(it pokes in malloc internals, so no easy way to provide it out of band)