nx-libs
nx-libs copied to clipboard
malloc in signal handler causes deadlock
The 'new' logging code in nxcomp implicitly calls malloc() which causes the nxagent to hang forever. See this backtrace:
#0 0x00007f6388d9fd1c in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007f6388d1c29d in _L_lock_15512 () from /lib64/libc.so.6
#2 0x00007f6388d19383 in malloc () from /lib64/libc.so.6
#3 0x00007f63895d7ecd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#4 0x00007f6389636a19 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#5 0x00007f63896382a1 in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) () from /lib64/libstdc++.so.6
#6 0x00007f63896386d8 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#7 0x00007f638b83ffd2 in NXLogStamp::NXLogStamp (this=0x7ffc6429a930, level=NXINFO, file=<optimized out>, function=0x7f638b89a0a6 <NXTransSignal::__func__> "NXTransSignal", line=1634) at Log.h:108
#8 0x00007f638b822891 in NXTransSignal (signal=signal@entry=17, action=action@entry=3) at Loop.cpp:1634
#9 0x000000000048b6d7 in nxagentSigchldHandler (signal=<optimized out>) at Display.c:561
#10 <signal handler called>
#11 0x00007f6388d169e3 in _int_malloc () from /lib64/libc.so.6
#12 0x00007f6388d1932c in malloc () from /lib64/libc.so.6
#13 0x000000000045fb0e in DeletePassiveGrabFromList (pMinuendGrab=pMinuendGrab@entry=0x7ffc6429b120) at grabs.c:325
#14 0x000000000041be9c in ProcUngrabKey (client=0x1a23af0) at ../../dix/events.c:4068
#15 0x0000000000431aee in Dispatch () at NXdispatch.c:476
#16 0x000000000040f29a in main (argc=18, argv=0x7ffc6429b388, envp=<optimized out>) at main.c:353
Malloc() is not allowed inside signal handlers, see https://stackoverflow.com/questions/40049751/malloc-inside-linux-signal-handler-cause-deadlock
So either we remove the logging from the signal handler code or find some way to prevent malloc()s from happening.
Yeah, the signal handler is way too complicated. :/
The only safe operations in a signal handler are to call async-signal-safe functions and set variables, preferably only the latter.
malloc
is not async-signal-safe so what I've done is not conforming to standards like POSIX/SUS, but it was the easiest way to get it done. I'd have to think of a really good way to reimplement this, but I fear that's quite difficult. We don't have a logging loop or something like that that could react to signals in a proper way...
On Wed, Oct 9, 2019 at 3:05 PM Mihai Moldovan [email protected] wrote:
Yeah, the signal handler is way too complicated. :/
The only safe operation in a signal handler is to call async-signal-safe functions and set variables, preferably only the latter.
malloc is not async-signal-safe so what I've done is not conforming to standards like POSIX/SUS, but it was the easiest way to get it done. I'd have to think of a really good way to reimplement this, but I fear that's quite difficult. We don't have a logging loop or something like that that could react to signals in a proper way...
We have seen a least one deadlock in the wild, so this needs to be fixed somehow. I have not looked through the code thoroughly yet. But can't we simply drop all the logging code within signal handlers?
It happened again after a network interruption:
#0 0x00007f002dac357c in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007f002da3faf5 in _L_lock_17166 () from /lib64/libc.so.6
#2 0x00007f002da3cb33 in malloc () from /lib64/libc.so.6
#3 0x00007f002e2fbecd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#4 0x00007f002e35aa19 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#5 0x00007f002e35c2a1 in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) () from /lib64/libstdc++.so.6
#6 0x00007f002e35c6d8 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#7 0x00007f0030563fa2 in NXLogStamp::NXLogStamp (this=0x7fff11278670, level=NXINFO, file=<optimized out>, function=0x7f00305be186 <NXTransSignal::__func__> "NXTransSignal", line=1638)
at Log.h:108
#8 0x00007f0030546311 in NXTransSignal (signal=signal@entry=17, action=action@entry=3) at Loop.cpp:1638
#9 0x0000000000482217 in nxagentSigchldHandler (signal=<optimized out>) at Display.c:557
#10 <signal handler called>
#11 0x00007f002da3787a in malloc_consolidate () from /lib64/libc.so.6
#12 0x00007f002da39485 in _int_malloc () from /lib64/libc.so.6
#13 0x00007f002da3cadc in malloc () from /lib64/libc.so.6
#14 0x00007f002d9efeb9 in qsort_r () from /lib64/libc.so.6
#15 0x00000000004b3d38 in nxagentUniquePixels (image=image@entry=0x1de4250) at Pixels.c:208