MINGW-packages icon indicating copy to clipboard operation
MINGW-packages copied to clipboard

gcc thread_local destructor cause use after free

Open sbabbi opened this issue 7 years ago • 26 comments

Hello,

I posted this issue on the gcc bug tracker, but I am not entirely sure if this is supposed to be a GCC bug or a mingw bug.

I see it has also been reported here, however at the moment I can not create a sourceforge account to comment on that. I apologize if this is not the proper place.


The issue here, to my best understanding, is the following.

  • At the very first access of a thread-local object in a new program, emutls_init is called via __emutls_get_address. This function allocates some memory for the storage using malloc (see emutls_alloc), and registers emutls_destroy to be executed at thread exit, using __gthread_key_create.

  • Immediately after, the TLS init function (generated by GCC) is called. This calls the C++ constructor on the storage provided at the previous point. It also uses __gthread_key_create to register the C++ destructor for execution at thread exit.

  • When a thread terminates, _pthread_cleanup_dest in winpthread is called. This function is responsible for calling all the destructors registered with pthread_key_create. Unfortunately, the execution order of such destructors is unspecified. In this case, it happens that emutls_destroy is called before the C++ destructor, which then executes on deallocated memory.


A quick workaround would be to reverse the loop in winpthread/src/thread.c:842. This will cause the destructors to be called in reverse order, and it will probably work correctly in most of the cases. The only other solution would be to merge emutls+TLS init on GCC side, in such a way to have a single destructor callback to both invoke the C++ destructor and cleanup the memory.

sbabbi avatar May 25 '17 21:05 sbabbi

CC @lhmouse

mati865 avatar May 25 '17 23:05 mati865

Yes I knew this bug a couple of years ago... but who the heck has been caring about this problem?

The emutls thing is utterly broken. I happened to report a similar issue a few hours ago (if you LoadLibrary() a DLL and the DLL creates some thread_local objects, then FreeLibrary()'ing that DLL is very likely to lead to a crash), not to mention that __emutls_get_address() aborts the process once it runs out of memory.

So here comes the rule of thumb: Don't use thread_local on GCC for MinGW targets.

lhmouse avatar May 25 '17 23:05 lhmouse

Well.. pretty sad. I might post a patch to winpthread to workaround this specific issue.

sbabbi avatar May 26 '17 19:05 sbabbi

My understanding is slightly different:

  1. On thread_local variable creation GCC calls __emutls_get_address, which calls __gthread_key_create, which presumably calls pthread_key_create from winpthreads by MinGW. It adds corresponding "destructor" (emutls_destroy which deallocates memory) into an array of destructors. At the moment, logic is as follows: try inserting at _pthread_key_sch if available, go towards the end of the array, then wrap, then reallocate if the array is filled. However, it looks _pthread_key_sch is the smallest available place in the array.
  2. After completing the constructor successfully GCC calls __cxa_thread_atexit to register destructor. It's implemented in mingw-w64-crt (not winpthreads), it calls __mingw_cxa_thread_atexit, which adds destructor to a separate TLS no. tls_dtors_slot.
  3. When std::thread finishes, it returns control to pthread_create_wrapper from winpthreads, it's the routine which actually runs the thread and is passed _beginthreadex. This routine calls _pthread_cleanup_dest in the thread before terminating with _endthreadex. This in turn calls emutls_destroy and memory is deallocated.
  4. Now that _endthreadex is called and the system knows for sure the thread is terminating, tls_callback from mingw-w64-crt is called with DLL_THREAD_DETACH and runs destructors from bullet 2 above.

So, the core issue looks like there are two separate mechanisms for calling something at thread exit: pthread_key_create (which calls destructors before pthread-emulated thread is terminated) and __cxa_thread_atexit (which calls destructors when thread is actually terminating). Memory is allocated through the former, destructors are called through the latter, hence the use-after-free.

yeputons avatar Apr 28 '21 12:04 yeputons

Moreover, it looks like on Linux (at the very least) pthread_key_create destructors are called strictly after __cxa_thread_atexit, and on Windows it happens the other way around. Here is a sample code to test it:

#include <stdio.h>
#include <pthread.h>
#include <cxxabi.h>

int a, b, c, d, e;
extern int __dso_handle;

void mydestr_atexit(void *arg) {
    printf("mydestr_atexit(%p)\n", arg);
}

void mydestr_pthread_key(void *arg) {
    printf("mydestr_pthread_key(%p)\n", arg);
}

void* thread(void *arg) {
    printf("thread %p\n", arg);
    abi::__cxa_thread_atexit(mydestr_atexit, &b, &__dso_handle);

    {
        pthread_key_t key;
        pthread_key_create(&key, mydestr_pthread_key);
        pthread_setspecific(key, &c);
    }

    abi::__cxa_thread_atexit(mydestr_atexit, &d, &__dso_handle);

    {
        pthread_key_t key;
        pthread_key_create(&key, mydestr_pthread_key);
        pthread_setspecific(key, &e);
    }
    return nullptr;
}

int main() {
    printf("a=%p\nb=%p\nc=%p\nd=%p\ne=%p\n", &a, &b, &c, &d, &e);
    pthread_t th;
    pthread_create(&th, nullptr, thread, &a);
    pthread_join(th, nullptr);
}

Moreover, it looks like destructors by pthread_key can easily be called in a wrong order: from oldest keys to newest. So it feels kinda wrong to use it for TLS emulation.

yeputons avatar Apr 28 '21 12:04 yeputons

Relevant stuff: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58142 https://sourceforge.net/p/mingw-w64/bugs/727/ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80816

yeputons avatar Apr 28 '21 12:04 yeputons

I "discovered" this and tried to make some progress, but there's still a case that results in use-after-free: https://sourceforge.net/p/mingw-w64/mailman/message/37186085/

jeremyd2019 avatar Apr 28 '21 17:04 jeremyd2019

Will it help if _pthread_cleanup_dest in winpthreads calls destructors registered in crt by __mingw_cxa_thread_atexit (i.e. run_dtor_list from mingw-w64-crt\crt\tls_atexit.c)?

I believe something similar was suggested in this message, possibly for another issue: https://sourceforge.net/p/mingw-w64/mailman/message/37140300/

  1. Make crt expose a function like __mingw_cxa_thread_finalize which calls callbacks registered by __mingw_cxa_thread_atexit and removes them from the list. That way we don't have to modify existing logic near DLL_THREAD_DETACH.
    void __mingw_cxa_thread_finalize(void) {
        assert(inited);  // Not sure why it's needed, but sounds logical.
        dtor_obj * p = (dtor_obj *)TlsGetValue(tls_dtors_slot);
        run_dtor_list(&p);
        TlsSetValue(tls_dtors_slot, p);
    }
    
  2. Make winpthreads call __mingw_cxa_thread_finalize in _pthread_cleanup_dest. _pthread_cleanup_dest is called either by pthread_exit, or a terminating thread created via pthread_create, or similar DLL_THREAD_DETACH event (in which case we care less about order).

Such behavior would be a little more consistent with what happens on Linux: "user-defined" thread_atexit callbacks run first, "low-level" pthread cleanup runs last. It is assumed by at least some consumers, e.g. emutls, maybe others.

Should help with this specific issue. Not sure how it plays with dynamic linking, configuring, etc. The patch would be quite small, though.

Also, I assume it's impossible for crt to be detached from a thread before winpthreads, as the latter depends(?) on the former.

yeputons avatar Apr 28 '21 18:04 yeputons

This particular mingw-w64 "crt" code is present in each module, ie always statically linked. So each module has its own copy, and winpthreads calling it would call its copy only. Unless I'm missing something

jeremyd2019 avatar Apr 28 '21 18:04 jeremyd2019

Would probably be better to discuss such details on mingw-w64-public list though.

jeremyd2019 avatar Apr 28 '21 18:04 jeremyd2019

Can we turn the mingw's implementation of _Thread_local to be consistence with msvc? It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.

lygstate avatar Sep 15 '22 16:09 lygstate

Can we turn the mingw's implementation of _Thread_local to be consistence with msvc?

It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.

Please read this series of articles first: http://www.nynaeve.net/?p=183

lhmouse avatar Sep 15 '22 16:09 lhmouse

Can we turn the mingw's implementation of _Thread_local to be consistence with msvc? It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.

Please read this series of articles first: http://www.nynaeve.net/?p=183

Yeap, I've read it before, so I post it here

lygstate avatar Sep 15 '22 16:09 lygstate

Can we turn the mingw's implementation of _Thread_local to be consistence with msvc? It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.

Please read this series of articles first: http://www.nynaeve.net/?p=183

Yeap, I've read it before, so I post it here

Then you should have known that there is no such support in GCC. It is consequently impossible at least for now.

lhmouse avatar Sep 16 '22 06:09 lhmouse

I think it's time that we give the posix thread model up. See https://github.com/msys2/MINGW-packages/discussions/13259.

lhmouse avatar Sep 26 '22 09:09 lhmouse