zig
zig copied to clipboard
Segfault in binary created by zig cc related to pthread_cond_wait (musl x86_64)
I'm currently trying to use libsoundio via zig and am experiencing problems with running libsoundio's soundio_flush_events
function (via alsa driver). The test code I'm using is sio_sine.c from libsoundio's examples:
https://github.com/andrewrk/libsoundio/blob/master/example/sio_sine.c
Example, compiling with gcc, things work normally:
gcc -o sio_sine sio_sine.c -lsoundio
./sio_sine
However compiling with zig cc, I get a segmentation fault:
zig cc -lc -lsoundio sio_sine.c
./sio_sine
zsh: segmentation fault ./sio_sine
The root cause / the line that trips it up is soundio_flush_events
and under the hood, the line that trips that up has to do with pthread_cond_wait
: https://github.com/andrewrk/libsoundio/blob/master/src/os.c#L548
I'm on Alpine Linux / musl and have successfully used the same code with glibc, so I think this is a musl-specific issue. Zig code similar to the C example produces similar results, works in glibc zig, but not in musl zig.
The most common problem with musl is that the default thread size is too small. I don't have a musl toolchain to try compiling it myself, if you have a statically-linked bin (with debug infos) upload it somewhere and I'll have a look.
I've been struggling to produce a static binary - thing is alsa & libsoundio are dynamicaly linked by default on Alpine.
Also another piece for debugging info, binary produced with clang also works fine:
clang -lsoundio sio_sine.c -o sio_sine
So really I think this comes down to some flag that's being passed to clang with zig cc
.
Also based on your suggestion about stack size @LemonBoy, I tried going into the libsoundio code and raising the stack size via pthread_attr_setstacksize
however it was no use / the segfault still happened. I need to figure out a better way to debug..
I need to figure out a better way to debug..
Try running the program under gdb: gdb ./sio_sine
and get a backtrace with bt
, that should shed some light on why it's crashing.
Thanks for the suggestion - so I've installed gdb & debug symbols and I'm finding I think this has something to do with the way zig is interacting with alsa's shared library. I would think it's a bug with alsa, but the fact that things are working in gcc/clang with the same code and same shared library convince me that this is a zig bug.
GDB trace for sio_sine.c from libsoundio:
~/libsoundio/example> zig cc sio_sine.c -lsoundio -g -O0
~/libsoundio/example> gdb ./sio_sine
GNU gdb (GDB) 9.2
Reading symbols from ./sio_sine...
(gdb) run
Starting program: /home/m/libsoundio/example/sio_sine
[New LWP 13644]
Backend: ALSA
Thread 2 "sio_sine" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 13644]
0x00007ffff7eafe6d in snd_lib_error_set_local (func=func@entry=0x7ffff7eb165f <zero_handler>) at error.c:80
80 error.c: No such file or directory.
(gdb) bt
#0 0x00007ffff7eafe6d in snd_lib_error_set_local (func=func@entry=0x7ffff7eb165f <zero_handler>) at error.c:80
#1 0x00007ffff7eb1b45 in try_config (config=config@entry=0x23af40, list=list@entry=0x7ffff7e70ca8, base=<optimized out>,
name=<optimized out>) at namehint.c:243
#2 0x00007ffff7eb2908 in add_software_devices (list=0x7ffff7e70ca8, rw_config=0x23af40, config=<optimized out>)
at namehint.c:522
#3 snd_device_name_hint (card=<optimized out>, iface=<optimized out>, hints=0x7ffff7e70da8) at namehint.c:604
#4 0x00007ffff7f5cdbb in ?? () from /usr/lib/libsoundio.so.2
#5 0x00007ffff7f5dac5 in ?? () from /usr/lib/libsoundio.so.2
#6 0x00007ffff7f59061 in ?? () from /usr/lib/libsoundio.so.2
#7 0x000000000020b5a7 in start (p=0x7ffff7e71ee8) at /usr/lib/zig/libc/musl/src/thread/pthread_create.c:192
#8 0x000000000020cf6b in __clone () at /usr/lib/zig/libc/musl/src/thread/x86_64/clone.s:22
#9 0x0000000000000000 in ?? ()
(gdb)
Also I have discovered a segfault with alsa' PCM test program too (https://www.alsa-project.org/alsa-doc/alsa-lib/_2test_2pcm_8c-example.html). Same results, works in gcc & clang, doesn't work with zig cc. Here's the backtrace:
~/foo> zig cc pcm.c -lasound -g -O0
~/foo> gdb ./pcm
GNU gdb (GDB) 9.2
Reading symbols from ./pcm...
(gdb) run
Starting program: /home/m/foo/pcm
Playback device is plughw:0,0
Stream parameters are 44100Hz, S16_LE, 1 channels
Sine wave rate is 440.0000Hz
Using transfer method: write
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7ec2c76 in snd_lib_error_default (file=0x7ffff7f25016 "conf.c", line=3683,
function=0x7ffff7f25ad0 <__func__.10270> "snd_config_hooks_call", err=0,
fmt=0x7ffff7f2567b "Cannot open shared library %s (%s)") at error.c:102
102 error.c: No such file or directory.
(gdb) bt
#0 0x00007ffff7ec2c76 in snd_lib_error_default (file=0x7ffff7f25016 "conf.c", line=3683,
function=0x7ffff7f25ad0 <__func__.10270> "snd_config_hooks_call", err=0,
fmt=0x7ffff7f2567b "Cannot open shared library %s (%s)") at error.c:102
#1 0x00007ffff7ebf238 in snd_config_hooks_call (root=root@entry=0x22e120, config=config@entry=0x22e820, private_data=0x0)
at conf.c:3683
#2 0x00007ffff7ebf35a in snd_config_hooks (config=0x22e120, private_data=0x0) at conf.c:3731
#3 0x00007ffff7ebf80b in snd_config_update_r (_top=_top@entry=0x7ffff7f64160 <snd_config>,
_update=_update@entry=0x7ffff7f64170 <snd_config_global_update>, cfgs=cfgs@entry=0x0) at conf.c:4149
#4 0x00007ffff7ebf9b4 in snd_config_update_ref (top=top@entry=0x7fffffffe450) at conf.c:4205
#5 0x00007ffff7ed7685 in snd_pcm_open (pcmp=0x7fffffffe918, name=0x20391b "plughw:0,0", stream=SND_PCM_STREAM_PLAYBACK,
mode=0) at pcm.c:2671
#6 0x000000000020d5a1 in main (argc=1, argv=0x7fffffffeb28) at pcm.c:836
(gdb)
Commenting out snd_lib_error_default
within alsa-lib and rebuilding the shared library fixes the problem. E.g. I just commented out the entire body of this function:
https://github.com/alsa-project/alsa-lib/blob/master/src/error.c#L100
And then the binary produced with zig cc
runs fine. So there's something going on with how errors are being handled in musl zig that's off in this case.
Ok, I've managed to reproduce the problem in a alpine chroot. The process crashes because the TLS DTV is empty, I believe it has to do with ZIg's obsession to statically link everything while on alpine the libc is dynamically linked.
#5364