nx-libs icon indicating copy to clipboard operation
nx-libs copied to clipboard

malloc in signal handler causes deadlock

Open uli42 opened this issue 5 years ago • 3 comments

The 'new' logging code in nxcomp implicitly calls malloc() which causes the nxagent to hang forever. See this backtrace:

#0  0x00007f6388d9fd1c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007f6388d1c29d in _L_lock_15512 () from /lib64/libc.so.6
#2  0x00007f6388d19383 in malloc () from /lib64/libc.so.6
#3  0x00007f63895d7ecd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#4  0x00007f6389636a19 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#5  0x00007f63896382a1 in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) () from /lib64/libstdc++.so.6
#6  0x00007f63896386d8 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#7  0x00007f638b83ffd2 in NXLogStamp::NXLogStamp (this=0x7ffc6429a930, level=NXINFO, file=<optimized out>, function=0x7f638b89a0a6 <NXTransSignal::__func__> "NXTransSignal", line=1634) at Log.h:108
#8  0x00007f638b822891 in NXTransSignal (signal=signal@entry=17, action=action@entry=3) at Loop.cpp:1634
#9  0x000000000048b6d7 in nxagentSigchldHandler (signal=<optimized out>) at Display.c:561
#10 <signal handler called>
#11 0x00007f6388d169e3 in _int_malloc () from /lib64/libc.so.6
#12 0x00007f6388d1932c in malloc () from /lib64/libc.so.6
#13 0x000000000045fb0e in DeletePassiveGrabFromList (pMinuendGrab=pMinuendGrab@entry=0x7ffc6429b120) at grabs.c:325
#14 0x000000000041be9c in ProcUngrabKey (client=0x1a23af0) at ../../dix/events.c:4068
#15 0x0000000000431aee in Dispatch () at NXdispatch.c:476
#16 0x000000000040f29a in main (argc=18, argv=0x7ffc6429b388, envp=<optimized out>) at main.c:353

Malloc() is not allowed inside signal handlers, see https://stackoverflow.com/questions/40049751/malloc-inside-linux-signal-handler-cause-deadlock

So either we remove the logging from the signal handler code or find some way to prevent malloc()s from happening.

uli42 avatar Oct 09 '19 08:10 uli42

Yeah, the signal handler is way too complicated. :/

The only safe operations in a signal handler are to call async-signal-safe functions and set variables, preferably only the latter.

malloc is not async-signal-safe so what I've done is not conforming to standards like POSIX/SUS, but it was the easiest way to get it done. I'd have to think of a really good way to reimplement this, but I fear that's quite difficult. We don't have a logging loop or something like that that could react to signals in a proper way...

Ionic avatar Oct 09 '19 13:10 Ionic

On Wed, Oct 9, 2019 at 3:05 PM Mihai Moldovan [email protected] wrote:

Yeah, the signal handler is way too complicated. :/

The only safe operation in a signal handler is to call async-signal-safe functions and set variables, preferably only the latter.

malloc is not async-signal-safe so what I've done is not conforming to standards like POSIX/SUS, but it was the easiest way to get it done. I'd have to think of a really good way to reimplement this, but I fear that's quite difficult. We don't have a logging loop or something like that that could react to signals in a proper way...

We have seen a least one deadlock in the wild, so this needs to be fixed somehow. I have not looked through the code thoroughly yet. But can't we simply drop all the logging code within signal handlers?

uli42 avatar Oct 09 '19 13:10 uli42

It happened again after a network interruption:

#0  0x00007f002dac357c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007f002da3faf5 in _L_lock_17166 () from /lib64/libc.so.6
#2  0x00007f002da3cb33 in malloc () from /lib64/libc.so.6
#3  0x00007f002e2fbecd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#4  0x00007f002e35aa19 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#5  0x00007f002e35c2a1 in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) () from /lib64/libstdc++.so.6
#6  0x00007f002e35c6d8 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#7  0x00007f0030563fa2 in NXLogStamp::NXLogStamp (this=0x7fff11278670, level=NXINFO, file=<optimized out>, function=0x7f00305be186 <NXTransSignal::__func__> "NXTransSignal", line=1638)
    at Log.h:108
#8  0x00007f0030546311 in NXTransSignal (signal=signal@entry=17, action=action@entry=3) at Loop.cpp:1638
#9  0x0000000000482217 in nxagentSigchldHandler (signal=<optimized out>) at Display.c:557
#10 <signal handler called>
#11 0x00007f002da3787a in malloc_consolidate () from /lib64/libc.so.6
#12 0x00007f002da39485 in _int_malloc () from /lib64/libc.so.6
#13 0x00007f002da3cadc in malloc () from /lib64/libc.so.6
#14 0x00007f002d9efeb9 in qsort_r () from /lib64/libc.so.6
#15 0x00000000004b3d38 in nxagentUniquePixels (image=image@entry=0x1de4250) at Pixels.c:208

uli42 avatar Apr 28 '20 09:04 uli42