oneTBB
oneTBB copied to clipboard
libtbb.so cannot be used with dlmopen()
Hi TBB team,
libtbb.so appears to be incompatible with dlmopen(). It's admittedly a niche use case, but a library or plug-in may want to link against tbb in isolation of a larger program.
There appear to be a few factors at play:
(1) libtbb.so attempts to dlopen() various other libraries (iomp5, tcm) as part of static initialization logic. At least on linux, it appears that libdl.so uses a long jump instruction for error handling, and an inner dlopen() will return control to an outer dlmopen() call on failure. Thus if the inner dlopen("libtcm.so") or dlopen("libiomp5.so") fails, instead of simply returning null as expected, control is transferred directly to the outer dlmopen(). Curiously, this behavior does not happen with nested dlopen() calls, so it may be a bug in the dlmopen() behavior, ~but I have not been able to locate any bug tracker references.~ https://sourceware.org/bugzilla/show_bug.cgi?id=31164
Below is the stack trace immediately before a call to dlopen("libtcm.so"). Doing a single step shows control going directly back to dlmopen() instead of unrolling the stack as expected.
Before:
#0 0x00007ffff6b00050 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#1 0x00007ffff6d16a78 in global_symbols_link (required=11, descriptors=0x7ffff6f36440 <tbb::detail::r1::(anonymous namespace)::tcm_link_table>, library=0x7ffff6d2c425 "libtcm.so.1") at oneTBB/src/tbb/dynamic_link.cpp:391
#2 _ZN3tbb6detail2r112dynamic_linkEPKcPKNS1_23dynamic_link_descriptorEmPPvi (library=library@entry=0x7ffff6d2c425 "libtcm.so.1", descriptors=descriptors@entry=0x7ffff6f36440 <tbb::detail::r1::(anonymous namespace)::tcm_link_table>,
required=required@entry=11, handle=handle@entry=0x0, flags=flags@entry=7) at oneTBB/src/tbb/dynamic_link.cpp:467
#3 0x00007ffff6d210d3 in tbb::detail::r1::tcm_adaptor::initialize () at oneTBB/src/tbb/tcm_adaptor.cpp:241
#4 tbb::detail::r1::__TBB_InitOnce::add_ref () at oneTBB/src/tbb/main.cpp:97
#5 0x00007ffff6d0fe57 in _GLOBAL__sub_I_main.cpp.lto_priv.98 () at oneTBB/src/tbb/main.h:73
#6 0x00007ffff6d0fe8e in global constructors keyed to 65535_0_address_waiter.cpp.o.30392 () from oneTBB/install/lib64/libtbb.so
#7 0x00007ffff7de7ef2 in call_init.part () from /lib64/ld-linux-x86-64.so.2
#8 0x00007ffff7de7fe6 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#9 0x00007ffff7dec16d in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6
#11 0x00007ffff7deb9a9 in _dl_open () from /lib64/ld-linux-x86-64.so.2
#12 0x00007ffff7bd6960 in dlmopen_doit () from /lib64/libdl.so.2
#13 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6
#14 0x00007ffff7bd6675 in _dlerror_run () from /lib64/libdl.so.2
#15 0x00007ffff7bd6a36 in dlmopen () from /lib64/libdl.so.2
#16 0x000000000040086c in main (argc=1, argv=0x7fffffffe3b8) at wtf.cpp:7
After a single step:
#0 0x00007ffff7078361 in _dl_catch_error () from /lib64/libc.so.6
#1 0x00007ffff7deb9a9 in _dl_open () from /lib64/ld-linux-x86-64.so.2
#2 0x00007ffff7bd6960 in dlmopen_doit () from /lib64/libdl.so.2
#3 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6
#4 0x00007ffff7bd6675 in _dlerror_run () from /lib64/libdl.so.2
#5 0x00007ffff7bd6a36 in dlmopen () from /lib64/libdl.so.2
#6 0x000000000040086c in main (argc=1, argv=0x7fffffffe3b8) at wtf.cpp:7
(2) In the event where libiomp5.so is present, it is opened with RTLD_GLOBAL
https://github.com/oneapi-src/oneTBB/blob/3b9f9baef9c27e4d22ebeff59118aaddaa40e9f2/src/tbb/misc_ex.cpp#L165
This triggers the SEGFAULT mentioned here:
https://manpages.debian.org/testing/manpages-dev/dlmopen.3.en.html#BUGS
As at glibc 2.24, specifying the RTLD_GLOBAL flag when calling dlmopen() generates an error. Furthermore, specifying RTLD_GLOBAL when calling dlopen() results in a program crash (SIGSEGV) if the call is made from any object loaded in a namespace other than the initial namespace.
(3) (Into the weeds here) If you patch libdl to work around the above bugs so that libtbb.so successfully can be dlmopen()'d AND your initial program used pthread_key_create(), it ultimately segfaults. The issue appears to originate in the following code:
thread_data* td = theTLS.get();
if (td) {
return td;
}
init_external_thread();
https://github.com/oneapi-src/oneTBB/blob/3b9f9baef9c27e4d22ebeff59118aaddaa40e9f2/src/tbb/governor.h#L100C8-L100C8
On the first call, the theTLS.get() is expected to return null and initialize a bunch of things. However, due to an erroneous interaction between posix thread local variables and dlmopen(), the call instead returns a garbage reference to a thread local variable in the outer linker namespace. As a result, TBB's arena is never initialized and you end up with a segfault here:
https://github.com/oneapi-src/oneTBB/blob/3b9f9baef9c27e4d22ebeff59118aaddaa40e9f2/src/tbb/task_group_context.cpp#L182
This bug is documented here:
https://sourceware.org/bugzilla/show_bug.cgi?id=24776
All of the factors below are ultimately due to deficiencies in other libraries. Nevertheless, there are a few things that tbb can do to sidestep the bugs:
- Avoid
dlopen()during static initialization logic, or at least provide an opt-out mechanism. This is arguably good hygiene for a library anyways; it was quite surprising that simply using atbb::concurrent_unordered_settriggered multipledlopen()calls and initialized a thread pool. - Use newer thread-local constructs (e.g.
__thread) instead of the constructs in pthread if available.
Repro:
#include <iostream>
#include <dlfcn.h>
#include <sysexits.h>
int main(int argc, char * argv[]){
void * handle = dlmopen(LM_ID_NEWLM, "libtbb.so", RTLD_LAZY);
if(!handle){
std::cerr << dlerror() << std::endl;
return EX_SOFTWARE;
}
return EX_OK;
}
Thank you for submitting this and providing the reproducer and the details.
@aws-taylor is this issue still relevant?
Hi @arunparkugan,
Low priority, but yes, I believe this issue still impacts tbb.