IXWebSocket
IXWebSocket copied to clipboard
Segmentation fault when using static binaries on unix.
There appears to be a threading issues within the IXDNSLookup.cpp when IXWebSocket is linked as a static library.
To recreate the problem, use the main.cpp example code, and change the server URL to 127.0.0.1:8000 or any other address that does not have a running websocket server. The target platform is Ubuntu 20.04.
Using the standard dynamic linked binary, you will see something like
Connecting to ws://127.0.0.1:8000...
Connection error: Unable to connect to 127.0.0.1 on port 8000, error: Connect error: Connection refused
This message will repeat over and over again as expected.
Now, create a statically linked version of the same main.cpp
Connecting to ws://127.0.0.1:8000...
Connection error: Unable to connect to 127.0.0.1 on port 8000, error: Connect error: Connection refused Segmentation fault (core dumped)
Upon closer examination, the culprit appears to be DNS related. Given what has been reported in ticket 362, there might be something going with resource allocation that requires more investigation.
This change that seems to work.
struct addrinfo* DNSLookup::resolve(std::string& errMsg,
const CancellationRequest& isCancellationRequested,
bool cancellable)
{
// THIS does not work
///return cancellable ? resolveCancellable(errMsg, isCancellationRequested)
//: resolveUnCancellable(errMsg, isCancellationRequested);
// THIS does work
return resolveUnCancellable(errMsg, isCancellationRequested);
}
I can't reproduce that on my mac with clang, and on ubuntu linux 21.10 with gcc-11.
I used the ws example. I just found a bug with the retry logic which I just fixed.
Can you dump the coredump, or run this in gdb ?
This bug only shows up when the binary is statically linked. I can probably test on other platforms and see if I can the same results.
[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". ixwebsocket/11.3.2 linux ssl/mbedtls 2.16.5 zlib 1.2.11 Connecting to ws://127.0.0.1:8000... [New Thread 0x7ffff7ffa700 (LWP 26793)]
Connection error: Unable to connect to 127.0.0.1 on port 8000, error: Connect error: Connection refused
Thread 2 "ws://127.0.0.1:" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7ffa700 (LWP 26793)]
0x0000000000000000 in ?? ()
(gdb) where
#0 0x0000000000000000 in ?? ()
#1 0x000000000040e5b1 in ix::WebSocket::checkConnection(bool) ()
#2 0x000000000040f608 in ix::WebSocket::run() ()
#3 0x0000000000537dc4 in execute_native_thread_routine ()
#4 0x000000000056b139 in start_thread (arg=
This is also confirmed on Centos 7.9.
gdb) run
[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". ixwebsocket/11.4.0 linux nossl Connecting to ws://echo.websocket.org... [New Thread 0x7ffff7ffb700 (LWP 3080)]
[New Thread 0x7ffff77f9700 (LWP 3081)] [Thread 0x7ffff77f9700 (LWP 3081) exited] Connection error: Expecting status 101 (Switching Protocol), got 200 status connecting to ws://echo.websocket.org, HTTP Status line: HTTP/1.1 200 OK
[New Thread 0x7ffff77f9700 (LWP 3082)]
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff77f9700 (LWP 3082)] 0x00007ffff6de15ed in internal_getent () from /lib64/libnss_files.so.2 Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64 (gdb)
I had checked out the latest build from github and made the follow changes
diff --git a/CMakeLists.txt b/CMakeLists.txt
index e7341bc..7e09a50 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -12,7 +12,7 @@ set (CMAKE_CXX_STANDARD 11)
set (CXX_STANDARD_REQUIRED ON)
set (CMAKE_CXX_EXTENSIONS OFF)
-option (BUILD_DEMO OFF)
+set (BUILD_DEMO ON)
if (${CMAKE_SYSTEM_NAME} MATCHES "Linux")
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
@@ -233,6 +233,7 @@ endif()
option(USE_ZLIB "Enable zlib support" TRUE)
+set (USE_ZLIB false)
if (USE_ZLIB)
# This ZLIB_FOUND check is to help find a cmake manually configured zlib
if (NOT ZLIB_FOUND)
@@ -315,5 +316,5 @@ endif()
if (BUILD_DEMO)
add_executable(demo main.cpp)
- target_link_libraries(demo ixwebsocket)
+ target_link_libraries(demo ixwebsocket -static)
endif()
diff --git a/main.cpp b/main.cpp
index 8512537..4278c97 100644
--- a/main.cpp
+++ b/main.cpp
@@ -28,7 +28,8 @@ int main()
// Connect to a server with encryption
// See https://machinezone.github.io/IXWebSocket/usage/#tls-support-and-con
- std::string url("wss://echo.websocket.org");
+ //std::string url("wss://echo.websocket.org");
+ std::string url("ws://echo.websocket.org");
webSocket.setUrl(url);
std::cout << ix::userAgent() << std::endl;
Is the executable a static executable ?
What does ldd my_exe
says ?
The binary is statically linked.
demo: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=5f9873590e43f36045261afb005075298df00dbe, for GNU/Linux 3.2.0, with debug_info, not stripped
Can you try to replicate the problem using the git diff output provided in this thread? This issue can be consistently reproduced in at least Ubuntu and CentOS.
While a work around has been found, it might significantly impact DNS performance. The current design appears to allow more efficient lookup by cancelling duplicated calls. Is that what the code is doing?
I see. I never tested fully statically linked binaries. I've seen odd threading behavior with threading (pthread) and static binaries so this isn't too surprising.
The code allows to 'cancel' a hanged dns query, it isn't really for performance reason that the dns/getaddrinfo happens on a background thread. On Mobile platform blocking the main thread is forbidden so this is why we're doing this.
Maybe a CMake option would be the best way to support this, so that you don't have to use a fork of the library. Or we can try to see if there's a simple modification we can make to make this work.
Sometimes the ordering of the libraries is important, not sure it's the case here. Are you building glibc yourself ?
gLibC is the default one from the OS. If the resolveUnCancellable can be used safely used in all system other than Android, then maybe #ifdef (ANDROID) #else block is the easiest way to handle this condition.
This code has been running fine on Android, so it's really a problem when making static binaries.
On Android typically native code will compiled as a shared library, which is why this problem is avoided.
Stale issue message