zig
zig copied to clipboard
when bootstrapping, wasm2c generates C code with problematic stack space requirements
Context: I work on a C compiler for x86-64 Linux, slimcc. Despite very basic, barely -O0 quality codegen, I've gotten several languages including Python, Perl, PHP to reliably build and pass tests. In attempting to do the same with Zig through the bootstrap.c process, I found the zig1 binary crashed in the following step:
./zig1 lib build-exe -ofmt=c -lc -OReleaseSmall --name zig2 -femit-bin=zig2.c -target x86_64-linux --dep build_options --dep aro -Mroot=src/main.zig -Mbuild_options=config.zig -Maro=lib/compiler/aro/aro.zig
With backtrace under gdb,
Program received signal SIGSEGV, Segmentation fault.
0x0000000000cf9d3d in f1151 ()
(gdb) bt
#0 0x0000000000cf9d3d in f1151 ()
#1 0x0000000000a02577 in f1677 ()
#2 0x0000000000d39448 in f1151 ()
#3 0x00000000009ff2da in f1684 ()
#4 0x0000000000d42545 in f1151 ()
#5 0x0000000000a02577 in f1677 ()
#414 0x0000000000d41926 in f1151 ()
#415 0x0000000000a02577 in f1677 ()
#416 0x0000000000d39448 in f1151 ()
#417 0x0000000000d45467 in f1151 ()
#418 0x00000000009ff2da in f1684 ()
#419 0x00000000004df1ed in f3123 ()
#420 0x0000000000677732 in f2552 ()
#421 0x00000000009d43fb in f1711 ()
#422 0x0000000000d9bb35 in f1151 ()
#423 0x00000000010c4369 in f743 ()
#424 0x000000000089e696 in f2037 ()
#425 0x0000000000ceeb5c in f1162 ()
#426 0x00000000010c6b57 in f743 ()
#427 0x00000000012810e1 in f395 ()
#428 0x0000000001437ad1 in f0 ()
#429 0x0000000000401384 in wasm.start ()
#430 0x000000000146456b in main ()
and the stack depth used up 8MB, typical stack size of Linux,
(gdb) up 500
#430 0x000000000146456b in main ()
(gdb) print $sp
$3 = (void *) 0x7fffffffe1f0
(gdb) down 500
#0 0x0000000000cf9d3d in f1151 ()
(gdb) print $sp
$4 = (void *) 0x7fffff7f9f20
we can tell it's stack-overflow caused by three functions f1684() f1151() f1677() recursively calling each other.
Naturally, I'm curious how well other C compilers do in the situation, so I tried what I have installed:
pccfrom Debian package: hangs over 10 minutes, had to shut it down.cparserbuilt from source: again over 10 minutes.kefirfrom Arch AUR: crashed while buildingzig1.c.cprocbuilt from source: internal error, probably notzig1.c's problem.tcc: builds but segfaults the same way, and I found a closed PR mentioning similar experience https://github.com/ziglang/zig/pull/19831.gccandclangat-O0: segfaults the same way.
It shows that currently the only C compilers (that I'm aware of) actually able to build Zig with bootstrap.c on Linux x86-64 in a well-behaved manner, are compilers based on industrial-grade GCC and Clang, and only with their well-tuned optimization passes enabled (the default was -Os).
Now I know this reads like a "please help my compiler look more capable than it is" begging post; I absolutely should try to improve it instead of ranting here. However, bootstrap.c is out there, people inevitably will try it with whatever C compiler at hand, like with https://github.com/ziglang/zig/pull/19831.
I hesitated to file this issue, but I read a quote in that PR,
Go ahead and use TCC if you want. Zig's bootstrap process doesn't care. You bring your own c compiler to the table. That's the whole point! Originally posted by @andrewrk in https://github.com/ziglang/zig/issues/19831#issuecomment-2088856512
that does sound like what I attempted should work, feel free to close this if I'm mistaken.
Slightly related: over 80 percent of the 200MB zig1.c file consists of indenting white spaces, I think it's worth trimming down a bit.
This isn't surprising; zig1.c is created by wasm2c and the code we generate there is less than ideal, just as the code we generate in the C backend (which is used for zig2.c). Yet for some reason, we only build zig2 with a larger stack size, not zig1. In #22054, I couldn't get zig1 running correctly when built by cl.exe on Windows either exactly due to this. That's why that PR is also changing the stack size for zig1, and I think this issue is just more evidence that this is the correct thing to do.
We should still try to improve the C backend and wasm2c, but in the meantime, we do also need to be able to actually bootstrap.
Perhaps you could try with a 16 MB stack size and see if that works with your C compiler?
Slightly related: over 80 percent of the 200MB
zig1.cfile consists of indenting white spaces, I think it's worth trimming down a bit.
It's nice to have the indentation for the benefit of people working on ~~the C backend~~ wasm2c. That said, we could probably ~~omit it when building with -O ReleaseSmall~~ emit it only when given a --debug flag or similar.
Slightly related: over 80 percent of the 200MB
zig1.cfile consists of indenting white spaces, I think it's worth trimming down a bit.It's nice to have the indentation for the benefit of people working on the C backend. That said, we could probably omit it when building with
-O ReleaseSmall.
I agree that having the option is nice for debuggability, but since the zig1.c file is supposed to be a temporary generated file that isn't looked at by humans during the build procedure, it should make sense to disable the whitespace for it by default. (Maybe that's what you're already suggesting, not sure which target/phase -O ReleaseSmall applies to here, maybe that's already used when generating zig1.c in status-quo?)
I'd written that paragraph while confusing zig1.c and zig2.c. What we would need for zig1.c is a wasm2c option; the C backend has nothing to do with zig1.c.
(However, I suppose what I said is still true of the C backend in regards to zig2.c.)
A little survey done by simply bumping ulimit -s unlimited and monitor ./zig1 lib build-exe ... with memusage.
I'm not familiar with the tool so can't tell how accurate the numbers are, but this should suffice as a relative clue.
stack peak: 9920 // gcc -Os
stack peak: 151424 // gcc -O0
stack peak: 17536 // clang -Os
stack peak: 154640 // clang -O0
stack peak: 81200 // slimcc
stack peak: 151872 // TinyCC
It seems like you just need to pass a stack size flag right? What exactly is taxing on the C compiler? I'm confused why you're mentioning whitespace since the only thing that could possibly affect - and even then it would be dubious - would be the compiler.
Thank you for taking care of the situation.
I tried the stack-size=16MB version of bootstrap.c in #22054 with system gcc and "-Os" changed to "-O0" to emulate less-optimal compilers (as shown with little survey above, tcc is close to gcc -O0), the result is another segfault.
Moreover, under gdb the stack depth is still 8MB,
0x0000000000ad3427 in f1199 ()
(gdb) up 500
#233 0x0000000001319d09 in main ()
(gdb) print $sp
$1 = (void *) 0x7fffffffd8d0
(gdb) down 500
#0 0x0000000000ad3427 in f1199 ()
(gdb) print $sp
$2 = (void *) 0x7fffff7feeb0
even though objdump -p ./zig1 confirms the flag had set the memsz section to 16M.
STACK off 0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
filesz 0x0000000000000000 memsz 0x0000000001000000 flags rw-
My object format knowledge is as good as noob, no idea why the flag isn't effective.
Also note that tcc uses its own linker by default, which doesn't know the -z,stack-size flag.
My object format knowledge is as good as noob, no idea why the flag isn't effective.
I wonder if there's some security hardening nonsense going on where the kernel refuses to give the program a larger stack because of ulimit -s?
https://github.com/ziglang/zig/blob/aa7d138462602e086aacf738e4b92bfa3372bebe/lib/std/start.zig#L511-L515
~What does this have to do with the C compiler?~ I see now, the stack space is more than 8 MiB, and only GCC/Clang's optimizations are powerful enough to reduce the stack space requirements to be lower than that threshold. I changed the issue title to reflect this.
I wonder if there's some security hardening nonsense going on where the kernel refuses to give the program a larger stack because of
ulimit -s?
The result is without ulimit -s, here is a little Dockerfile to check major linux distributions against https://github.com/ziglang/zig/pull/22054 + "-O0",
#FROM debian:12-slim
#FROM ubuntu:latest
#RUN apt-get update && apt-get install -y gcc git
#FROM fedora:latest
#RUN dnf -y install gcc git
#FROM opensuse/tumbleweed:latest
#RUN zypper install -y gcc git
WORKDIR /build/
RUN git clone https://github.com/alexrp/zig --branch bootstrap-windows --depth 1 /build/zig
WORKDIR /build/zig/
RUN sed -i 's/"-Os"/"-O0"/g' bootstrap.c
RUN gcc bootstrap.c -o _bootstrap
RUN ./_bootstrap
on my x86-64 Linux desktop they all fail with:
0.226 cc stage1/wasm2c.c -std=c99 -O2 -o zig-wasm2c
0.990 ./zig-wasm2c stage1/zig1.wasm zig1.c
1.926 cc zig1.c stage1/wasi.c -std=c99 -O0 -lm -Wl,-z,stack-size=0x1000000 -o zig1
37.27 ./zig1 lib build-exe -OReleaseSmall -target x86_64-linux -lc -ofmt=c -femit-bin=zig2.c --name zig2 --dep build_options --dep aro -Mroot=src/main.zig -Mbuild_options=config.zig -Maro=lib/compiler/aro/aro.zig
50.63 fatal: child process crashed
Ok, so as far as I can tell, neither the Linux kernel nor glibc nor musl respect PT_GNU_STACK or GNU_PROPERTY_STACK_SIZE for the main thread.
For zig2, we're fine because we compile it for the build platform, so the expandStackSize() code gets included and saves the day (for Linux, anyway).
For zig1, I think the only thing we can do is translate the expandStackSize() code to C and put it in stage1/wasi.c where main() is defined. After that, we'll have to maintain the expandStackSize() code in both Zig and C going forward, which only really matters if that code gets ported to more platforms than just Linux.