Thread local storage and QNX
I am working on porting mimalloc to the QNX RTOS (7.x). After some trivial changes, I have encountered an issue related to thread local storage (TLS) on QNX, that I believe is described here: https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-internal.h#L280 Basically, the TLS layer calls malloc, causing my program to fall into a recursive infinite loop.
mimalloc has workarounds for MacOSX and OpenBSD. I was wondering if you could tell me more about what is being done here, so that I might develop my own workaround for QNX.
Reading the OpenBSD workaround, it seems that mimalloc is re-purposing (hijacking) an address within the pthread struct to store a pointer to the thread-local heap. Am I reading this right? Do you do this because pthread_setspecific()/pthread_getspecific() would be too slow, or are there other reasons?
Also, can you please tell me what is insufficient with the current DragonFly "workaround?"
Here is some basic information on TLS on QNX. However, in light of the OpenBSD workaround, I doubt that this will be enough information to go on.
Interesting and thanks for porting mimalloc to QNX :-) Most special cases are often to improve performance so I would first try to find a way to make it work and then consider how to improve performance. In particular:
- Try just to
#define MI_TLS_RECURSE_GUARDat the start ofmimalloc-internal.h - Try to also define
#define MI_TLS_PTHREAD(if QNX supports pthread ?)
Otherwise, we may have to do something more specific. I can talk/explain more tomorrow but I hope one of these things may help?
I made the suggested changes. I think we're moving forward, but I've run into a new issue. My program crashes on the _mi_thread_id() -> mi_tls_slot() path. Does _mi_thread_id() simply need to return a unique identifier?
On QNX, every thread has a _thread_local_storage struct at top of the thread's stack. We can get a pointer to it with the function __tls(void). I tried using (uintptr_t)__tls() for _mi_thread_id(), but the program now crashes for other reasons that are not clear. I haven't been able to get a stack trace from the resulting core dump.
The TLS array _thread_local_storage::__keydata is also directly indexable, but it appears to be null during mimalloc init. Perhaps memory is only allocated when the first pthread key is set.
Ah, that should not happen; my suggestions were for either a regular thread local variable, or pthreads, but not the _SLOT variants.
Is QNX defining __mach__ or something?
Can you try with a regular thread local? (eg. where
extern mi_decl_thread mi_heap_t* _mi_heap_default;
is declared? (and then together with MI_TLS_RECURSE_GUARD defined)
Just to expand a bit on the mechanics: when overriding malloc it becomes available even at load time before even the C runtime is initialized. So, mimalloc is very careful to not use any C runtime primitives while it is still being loaded.
On most systems (Linux,GNU) the loader pre-reserves space for some thread locals and things are fine, but
on macOSX (and other BSD's) it turns out that the loader will malloc even thread-local variables. So, on the first read of _mi_heap_default it actually invokes a "read thread local" function in the loader, that will call malloc to allocate thread local space, that will read _mi_heap_default .... etc.
On macOSX we work around this by not using a __tread local declaration but instead use one of the unused thread local "slots" (in our case 89 which used to belong to the "old" GC). On open BSD we find another unused location as an offset from the TLB, etc. Sometimes, we can work around it by using pthread_getspecific as well.
Now, a more robust way would be for mimalloc to pre-reserve a bit of memory (is the BSS segment) at load time directly so it can serve malloc requests in a special way early on when needing to allocate tread local memory -- that will certainly fix the problem on BSD like systems. If needed I can do this soonish and we can try it out on QNX as well -- but let's see first if a simpler solution works.
Thank you for the advice. I'll continue to poke around to see what I can do. For reference, my understanding is that QNX's user space has a BSD lineage. What holds for BSD has a good chance of holding for QNX too. QNX is as POSIX compliant as most any other platform claiming to be. I find that the definition of compliance is up to interpretation. I am relatively new to the OS, so I am learning about how it implements TLS as I go along.
Coming back to _mi_thread_id() for a moment. Can you confirm that _mi_thread_id() merely needs to be a unique value? I did not see any code where the uintptr_t is casted to a pointer to a complex data structure. If this is the case, why not simply use an incrementing atomic variable to assign ID? (Of course, this would probably less useful of an ID when debugging.)
Did you make any progress? Is there any easy way perhaps for me to access a QNX system somewhere to debug on? I think these kind of recursive threading issues are very difficult to get right and I may need to look at it myself to understand what is happening.
With regard to _mi_thread_id(), it indeed just has to be unique I think as it is only used to have a quick check to see if a free is local or not. So, a global atomic counter may be used to generate them -- but they still need to be assigned to a specific thread so one needs to store it somewhere in thread local storage. Mimalloc has three strategies for thread local storage: 1) use a regular C __thread declaration. But this does not always work due to recursive calls while loading (where the loader tries to allocate thread local storage for our thread local heap pointer. So, we also support 2) pthread local storage, and 3) on BSD systems (and macOS) we use a "slot", a piece of memory at a constant offset from the thread local block (TLB). The "slot" approach is fragile and dependent on the specific OS / architecture so I am still looking for a more robust approach; some allocators reserve a fixed amount of memory at startup (as a static array) to serve allocations while loading... maybe that is what we need to do for QNX (and other BSD's?) as well..
I've had a chance to pick this work back up. It seems that _mi_heap_set_default_direct() can recurse into mi_thread_init() through pthread_setspecific(). This would be an unexpected behavior, right?
hi i am trying the same to get mimalloc to run under QNX. It seems that MI_TLS_RECURSE_GUARD does the trick for debug builds but not for release. @daanx there is the option to download the trial version for QNX and you can use momentix the IDE to create a VM.
Small things to mention to get mimalloc compiled:
- QNX defines
cfreewith anintreturn value (instead ofvoid) so i had to change that. documentation - Also you need to remove the
-lpthreadlinker thread. QNX haspthreadintegrated and doesn't link to it.
I've had a chance to pick this work back up. It seems that
_mi_heap_set_default_direct()can recurse intomi_thread_init()throughpthread_setspecific(). This would be an unexpected behavior, right?
Do you make any progress?
hi i am trying the same to get
mimallocto run under QNX. It seems thatMI_TLS_RECURSE_GUARDdoes the trick for debug builds but not for release. @daanx there is the option to download the trial version for QNX and you can use momentix the IDE to create a VM. Small things to mention to get mimalloc compiled:
- QNX defines
cfreewith anintreturn value (instead ofvoid) so i had to change that. documentation- Also you need to remove the
-lpthreadlinker thread. QNX haspthreadintegrated and doesn't link to it.
are you succeed in porting mimalloc to qnx, if so, which version are you using. i compiler succeed for qnx. but after i run . i encounter an coredump . which i thought it related to tls.
Process 71921762 (cdc_performance_test) terminated SIGSEGV code=1 fltno=11 ip=00000000780167b8(/root/libmimalloc.so@mi_thread_init+0x0000000000000268) mapaddr=00000000000167b8. ref=0000000010047fd0
Has anyone solved the problem yet. About tsl infinite recursion