libwebsockets icon indicating copy to clipboard operation
libwebsockets copied to clipboard

lws_context_destroy () Closing the FD twice causes the process to crash

Open yetyongjin opened this issue 2 years ago • 1 comments

I have a WSS gateway service using 'libwebsockets-4.3-stable'. Recently received a crash report of a production line. The crash is triggered by GLIBC, it catch an error when the other thread call getifaddrs. It get such error:"Unexpected error 9 on netlink descriptor 19.\n"

#0 0x00007f64a3651aff in raise () from /lib64/libc.so.6 #1 0x00007f64a3624ea5 in abort () from /lib64/libc.so.6 #2 0x00007f64a3694097 in __libc_message () from /lib64/libc.so.6 #3 0x00007f64a369415a in __libc_fatal () from /lib64/libc.so.6 #4 0x00007f64a374fc44 in __netlink_assert_response () from /lib64/libc.so.6 #5 0x00007f64a374c762 in __netlink_request () from /lib64/libc.so.6 #6 0x00007f64a374c901 in getifaddrs_internal () from /lib64/libc.so.6 #7 0x00007f64a374d608 in getifaddrs () from /lib64/libc.so.6 #8 0x00007f64a47ecdd0 in bsd_localinfo (return_result=0x7f649d12a6b8, hints=0x7f649d12a6f0) at su_localinfo.c:1167 #9 su_getlocalinfo (hints=hints@entry=0x7f649d12a7d0, return_localinfo=return_localinfo@entry=0x7f649d12a7c8) at su_localinfo.c:242 #10 0x00007f64a47ca9ea in soa_init_sdp_connection_with_session (ss=ss@entry=0x7f64880603a0, c=0x7f649d12a940, buffer=buffer@entry=0x7f649d12a9a0 "10.10.50.52"

I further analyzed and found that the scenario triggered by this error is as follows: Thread A closes a file descriptor. Thread B calls getaddrinfo and opens a Netlink socket. It happens to receive the same descriptor value. Due to a bug, thread A closes the same file descriptor again. Normally, that would be benign, but due to the concurrent execution, the Netlink socket created by glibc is closed. Thread B attempts to use the Netlink socket descriptor and receives the EBADF error.

I further analyzed and found that the lws_context_destroy () call will close the same FD twice,The following is the call stack corresponding to closing FD=19 twice(line 1856 and line 1936 in context.c):

#0 close (fd=19) at co_hook_sys_call.cpp:336 #1 0x00007ff4290bf320 in __lws_close_free_wsi_final (wsi=0x7ff41010e2d0) at /GIT/unimrcp/3rd-libs/libwebsockets-4.3-stable/lib/core-net/close.c:884 #2 0x00007ff4290bf275 in __lws_close_free_wsi (wsi=0x7ff41010e2d0, reason=LWS_CLOSE_STATUS_NOSTATUS_CONTEXT_DESTROY, caller=0x7ff4290ff42f "ctx destroy") at /GIT/unimrcp/3rd-libs/libwebsockets-4.3-stable/lib/core-net/close.c:870 #3 0x00007ff4290bf6fc in lws_close_free_wsi (wsi=0x7ff41010e2d0, reason=LWS_CLOSE_STATUS_NOSTATUS_CONTEXT_DESTROY, caller=0x7ff4290ff42f "ctx destroy") at /GIT/unimrcp/3rd-libs/libwebsockets-4.3-stable/lib/core-net/close.c:1005 #4 0x00007ff4290acfc5 in lws_context_destroy (context=0x7ff410068250) at /GIT/unimrcp/3rd-libs/libwebsockets-4.3-stable/lib/core/context.c:1856

#0 close (fd=19) at co_hook_sys_call.cpp:336 #1 0x00007ff4290a0e3c in lws_plat_pipe_close (wsi=0x7ff421ffa7f0) at /GIT/unimrcp/3rd-libs/libwebsockets-4.3-stable/lib/plat/unix/unix-pipe.c:88 #2 0x00007ff4290acc82 in lws_pt_destroy (pt=0x7ff4100684d0) at /GIT/unimrcp/3rd-libs/libwebsockets-4.3-stable/lib/core/context.c:1689 #3 0x00007ff4290ad1fb in lws_context_destroy (context=0x7ff410068250) at /GIT/unimrcp/3rd-libs/libwebsockets-4.3-stable/lib/core/context.c:1936

yetyongjin avatar Apr 25 '23 01:04 yetyongjin

Has this problem since been fixed?

chanwu1100 avatar Jul 29 '24 02:07 chanwu1100

I pushed a patch on main + v4.3-stable that should help with this.

lws-team avatar Sep 25 '24 07:09 lws-team

There is a memory leak issue with https://github.com/warmcat/libwebsockets/commit/b486c2b545665b3174f7a466b4072b2a60916ed2

vsbc2010 avatar Sep 26 '24 14:09 vsbc2010

How can I reproduce that?

lws-team avatar Sep 26 '24 14:09 lws-team

Make one test program with libwebsockets,
1/ Create 2000 threads with 2000 lws clients
2/ Connect to minimal-ws-server-threads 3/ Each connection keep 10 seconds, then disconnect and reconnect 4/ Keep 10 min total, then too many memory used.

This test includes two types memory leak issue.

vsbc2010 avatar Sep 26 '24 17:09 vsbc2010

lws simply isn't threadsafe, so you will have all kinds of problems if you tried to do that.

lws-team avatar Sep 26 '24 18:09 lws-team