Loading debug symbols is not thread-safe on MSVC
On Windows, when debug symbols are loaded concurrently from multiple threads, such as when raising an exception or calling caller, most of those threads will fail. A snippet like:
4.times do
Thread.new do
puts "#{caller.inspect}\n"
end
end
puts "#{caller.inspect}\n"
sleep 0.1
could produce results like:
["???", "???", "???"]
["???", "???", "???"]
["???", "???", "???"]
["???", "???", "???"]
["usr\\test.cr:27 in '__crystal_main'", "src\\crystal\\main.cr:118 in 'main_user_code'", "src\\crystal\\main.cr:104 in 'main'", "src\\crystal\\main.cr:130 in 'main'", "src\\crystal\\system\\win32\\wmain.cr:37 in 'wmain'", "D:\\a\\_work\\1\\s\\src\\vctools\\crt\\vcstartup\\src\\startup\\exe_common.inl:288 in '__scrt_common_main_seh'", "C:\\WINDOWS\\System32\\KERNEL32.DLL +75133 in 'BaseThreadInitThunk'", "C:\\WINDOWS\\SYSTEM32\\ntdll.dll +372520 in 'RtlUserThreadStart'"]
["???", "???", "???"]
Unable to load debug information["???", "???", "???"]
Invalid memory access (C0000005) at address : SymInitializeW: The parameter is incorrect. (RuntimeError)
0x["???", "???", "???"]
1f565411000
from ???
from ???[0x7ff73a44789c]
from ???
from ???
???
from ???
from [0x1fffffff0000000d] ???
[0x7ff73a446163] ???
???
[["???"]
0xd00000000] ???
[0x52933d3fd4da2e04] ???
[0xe9fe19dd54ddbdb6] ???
[0x7ff73a4475e6] ???
[0x1f5653d0d00] ???
[0x6258e96d0000000d] ???
[0x1a36ad96b7e88ea] ???
[0x1f5653dcd80] ???
[0xab200000001] ???
[0x100000001] ???
[0x1] ???
[0x1f5653dcd80] ???
[0x97980500979805] ???
[0x1f5653d0d00] ???
This is also reproducible with fibers under -Dpreview_mt:
4.times do
spawn do
puts "#{caller.inspect}\n"
end
end
puts "#{caller.inspect}\n"
sleep 0.1
All related Win32 functions are single-threaded; replacing Exception::CallStack.@@sym_loaded with an Atomic::Flag is not enough, because those alternate threads still end up calling LibC.StackWalk64 to unwind their stacks before LibC.SymInitializeW returns. Therefore, all threads must await the completion of Exception::CallStack.load_debug_info_impl on whatever thread it is called.
To achieve this, #14905 or something like InitOnceExecuteOnce should be used instead.
What about a mutex in the meantime?
Loading symbols is also not thread-safe even after they have been initialized:
caller
Array.new(4) do
Thread.new do
100.times { caller }
end
end.each(&.join)
# `-Dpreview_mt`
require "wait_group"
caller
WaitGroup.wait do |wg|
4.times do
wg.spawn do
100.times { caller }
end
end
end
Eventually Exception::CallStack.decode_line_number will return bogus strings that include null characters at the end. So much of Exception::CallStack#decode_backtrace probably needs to be in a mutex as well.
On MinGW-w64 the snippets may still produce ???, but never break with an access violation, probably due to the DWARF parser being written in Crystal
I added a workflow to test sync on Windows, and the error indeed happens as soon as I specify -Dpreview_mt:
Unable to load debug information: SymInitializeW: The parameter is incorrect. (RuntimeError)
See for example https://github.com/ysbaddaden/sync/actions/runs/18980770977/job/54212434606
I'll have a look.
There are multiple potential solutions:
-
The immediate fix is indeed to put a mutex around init and decode. It creates a contention points when we print backtraces. That should usually be fine, but it might become an issue when somethings starts failing, and the application prints lots of exceptions?
-
Reading the DbgHelp documentation, I see the example dups the
GetCurrentProcess()handle, maybe we would dup a "debug handle" per thread, with a finalizer to cleanup symbols, then each thread would use their local debug handle for searching symbols? That might take some memory if it must load and duplicate the symbol tables for each thread. -
~~Reading this snippet in Rust, it initializes symbols once then loads all the symbol table into Rust memory. I didn't dig further, but I guess that once we have the symbols, we can search ourselves, and do so safely without any synchronization anymore?~~ nope, it loads the list of individual modules, then uses the DbgHelp API to resolve symbols, and they use a mutex to decode the backtrace.
That bit is interesting, though:
Note however that std specifically bypasses this lock, and uses the
*_unsynchronizedbacktrace APIs. This is "fine" because it wraps its own calls to backtrace in a non-reentrant Mutex that prevents two backtraces from getting interleaved during printing.https://github.com/rust-lang/backtrace-rs/blob/f2bcf65c7830b4e2cf8495553b20afd305986a51/src/lib.rs#L213-L216
It seems that DbgHelp is always single threaded, so even unwinding the stack with StackWalk64 needs to grab the lock. See for example https://jpassing.com/2008/03/12/walking-the-stack-of-the-current-thread/
Edit: ☝️ I can't reach a crash in practice though.
Note: I see that Rust's backtrace-rs uses RtlVirtualUnwind on Win64. I found a blog post series that talks about it and the higher level RtlCaptureStackBackTrace, then decodes the backtrace with DbgHelp. The Rtl API is apparently much faster, but I can't tell if it's thread safe.