MINGW-packages
MINGW-packages copied to clipboard
gcc thread_local destructor cause use after free
Hello,
I posted this issue on the gcc bug tracker, but I am not entirely sure if this is supposed to be a GCC bug or a mingw bug.
I see it has also been reported here, however at the moment I can not create a sourceforge account to comment on that. I apologize if this is not the proper place.
The issue here, to my best understanding, is the following.
-
At the very first access of a thread-local object in a new program,
emutls_init
is called via__emutls_get_address
. This function allocates some memory for the storage usingmalloc
(seeemutls_alloc
), and registersemutls_destroy
to be executed at thread exit, using__gthread_key_create
. -
Immediately after, the
TLS init function
(generated by GCC) is called. This calls the C++ constructor on the storage provided at the previous point. It also uses__gthread_key_create
to register the C++ destructor for execution at thread exit. -
When a thread terminates,
_pthread_cleanup_dest
in winpthread is called. This function is responsible for calling all the destructors registered withpthread_key_create
. Unfortunately, the execution order of such destructors is unspecified. In this case, it happens thatemutls_destroy
is called before the C++ destructor, which then executes on deallocated memory.
A quick workaround would be to reverse the loop in winpthread/src/thread.c:842. This will cause the destructors to be called in reverse order, and it will probably work correctly in most of the cases. The only other solution would be to merge emutls+TLS init on GCC side, in such a way to have a single destructor callback to both invoke the C++ destructor and cleanup the memory.
CC @lhmouse
Yes I knew this bug a couple of years ago... but who the heck has been caring about this problem?
The emutls thing is utterly broken. I happened to report a similar issue a few hours ago (if you LoadLibrary()
a DLL and the DLL creates some thread_local
objects, then FreeLibrary()
'ing that DLL is very likely to lead to a crash), not to mention that __emutls_get_address()
aborts the process once it runs out of memory.
So here comes the rule of thumb: Don't use thread_local
on GCC for MinGW targets.
Well.. pretty sad. I might post a patch to winpthread to workaround this specific issue.
My understanding is slightly different:
- On
thread_local
variable creation GCC calls__emutls_get_address
, which calls__gthread_key_create
, which presumably callspthread_key_create
fromwinpthreads
by MinGW. It adds corresponding "destructor" (emutls_destroy
which deallocates memory) into an array of destructors. At the moment, logic is as follows: try inserting at_pthread_key_sch
if available, go towards the end of the array, then wrap, then reallocate if the array is filled. However, it looks_pthread_key_sch
is the smallest available place in the array. - After completing the constructor successfully GCC calls
__cxa_thread_atexit
to register destructor. It's implemented inmingw-w64-crt
(notwinpthreads
), it calls__mingw_cxa_thread_atexit
, which adds destructor to a separate TLS no.tls_dtors_slot
. - When
std::thread
finishes, it returns control topthread_create_wrapper
fromwinpthreads
, it's the routine which actually runs the thread and is passed_beginthreadex
. This routine calls_pthread_cleanup_dest
in the thread before terminating with_endthreadex
. This in turn callsemutls_destroy
and memory is deallocated. - Now that
_endthreadex
is called and the system knows for sure the thread is terminating,tls_callback
frommingw-w64-crt
is called withDLL_THREAD_DETACH
and runs destructors from bullet 2 above.
So, the core issue looks like there are two separate mechanisms for calling something at thread exit: pthread_key_create
(which calls destructors before pthread-emulated thread is terminated) and __cxa_thread_atexit
(which calls destructors when thread is actually terminating). Memory is allocated through the former, destructors are called through the latter, hence the use-after-free.
Moreover, it looks like on Linux (at the very least) pthread_key_create
destructors are called strictly after __cxa_thread_atexit
, and on Windows it happens the other way around. Here is a sample code to test it:
#include <stdio.h>
#include <pthread.h>
#include <cxxabi.h>
int a, b, c, d, e;
extern int __dso_handle;
void mydestr_atexit(void *arg) {
printf("mydestr_atexit(%p)\n", arg);
}
void mydestr_pthread_key(void *arg) {
printf("mydestr_pthread_key(%p)\n", arg);
}
void* thread(void *arg) {
printf("thread %p\n", arg);
abi::__cxa_thread_atexit(mydestr_atexit, &b, &__dso_handle);
{
pthread_key_t key;
pthread_key_create(&key, mydestr_pthread_key);
pthread_setspecific(key, &c);
}
abi::__cxa_thread_atexit(mydestr_atexit, &d, &__dso_handle);
{
pthread_key_t key;
pthread_key_create(&key, mydestr_pthread_key);
pthread_setspecific(key, &e);
}
return nullptr;
}
int main() {
printf("a=%p\nb=%p\nc=%p\nd=%p\ne=%p\n", &a, &b, &c, &d, &e);
pthread_t th;
pthread_create(&th, nullptr, thread, &a);
pthread_join(th, nullptr);
}
Moreover, it looks like destructors by pthread_key
can easily be called in a wrong order: from oldest keys to newest. So it feels kinda wrong to use it for TLS emulation.
Relevant stuff: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58142 https://sourceforge.net/p/mingw-w64/bugs/727/ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80816
I "discovered" this and tried to make some progress, but there's still a case that results in use-after-free: https://sourceforge.net/p/mingw-w64/mailman/message/37186085/
Will it help if _pthread_cleanup_dest
in winpthreads
calls destructors registered in crt by __mingw_cxa_thread_atexit
(i.e. run_dtor_list
from mingw-w64-crt\crt\tls_atexit.c
)?
I believe something similar was suggested in this message, possibly for another issue: https://sourceforge.net/p/mingw-w64/mailman/message/37140300/
- Make crt expose a function like
__mingw_cxa_thread_finalize
which calls callbacks registered by__mingw_cxa_thread_atexit
and removes them from the list. That way we don't have to modify existing logic nearDLL_THREAD_DETACH
.void __mingw_cxa_thread_finalize(void) { assert(inited); // Not sure why it's needed, but sounds logical. dtor_obj * p = (dtor_obj *)TlsGetValue(tls_dtors_slot); run_dtor_list(&p); TlsSetValue(tls_dtors_slot, p); }
- Make winpthreads call
__mingw_cxa_thread_finalize
in_pthread_cleanup_dest
._pthread_cleanup_dest
is called either bypthread_exit
, or a terminating thread created viapthread_create
, or similarDLL_THREAD_DETACH
event (in which case we care less about order).
Such behavior would be a little more consistent with what happens on Linux: "user-defined" thread_atexit
callbacks run first, "low-level" pthread cleanup runs last. It is assumed by at least some consumers, e.g. emutls, maybe others.
Should help with this specific issue. Not sure how it plays with dynamic linking, configuring, etc. The patch would be quite small, though.
Also, I assume it's impossible for crt to be detached from a thread before winpthreads, as the latter depends(?) on the former.
This particular mingw-w64 "crt" code is present in each module, ie always statically linked. So each module has its own copy, and winpthreads calling it would call its copy only. Unless I'm missing something
Would probably be better to discuss such details on mingw-w64-public list though.
Can we turn the mingw's implementation of _Thread_local to be consistence with msvc? It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.
Can we turn the mingw's implementation of _Thread_local to be consistence with msvc?
It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.
Please read this series of articles first: http://www.nynaeve.net/?p=183
Can we turn the mingw's implementation of _Thread_local to be consistence with msvc? It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.
Please read this series of articles first: http://www.nynaeve.net/?p=183
Yeap, I've read it before, so I post it here
Can we turn the mingw's implementation of _Thread_local to be consistence with msvc? It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.
Please read this series of articles first: http://www.nynaeve.net/?p=183
Yeap, I've read it before, so I post it here
Then you should have known that there is no such support in GCC. It is consequently impossible at least for now.
I think it's time that we give the posix thread model up. See https://github.com/msys2/MINGW-packages/discussions/13259.