crystal icon indicating copy to clipboard operation
crystal copied to clipboard

Loading debug symbols is not thread-safe on MSVC

Open HertzDevil opened this issue 1 year ago • 5 comments

On Windows, when debug symbols are loaded concurrently from multiple threads, such as when raising an exception or calling caller, most of those threads will fail. A snippet like:

4.times do
  Thread.new do
    puts "#{caller.inspect}\n"
  end
end
puts "#{caller.inspect}\n"
sleep 0.1

could produce results like:

["???", "???", "???"]
["???", "???", "???"]
["???", "???", "???"]
["???", "???", "???"]
["usr\\test.cr:27 in '__crystal_main'", "src\\crystal\\main.cr:118 in 'main_user_code'", "src\\crystal\\main.cr:104 in 'main'", "src\\crystal\\main.cr:130 in 'main'", "src\\crystal\\system\\win32\\wmain.cr:37 in 'wmain'", "D:\\a\\_work\\1\\s\\src\\vctools\\crt\\vcstartup\\src\\startup\\exe_common.inl:288 in '__scrt_common_main_seh'", "C:\\WINDOWS\\System32\\KERNEL32.DLL +75133 in 'BaseThreadInitThunk'", "C:\\WINDOWS\\SYSTEM32\\ntdll.dll +372520 in 'RtlUserThreadStart'"]
["???", "???", "???"]
Unable to load debug information["???", "???", "???"]
Invalid memory access (C0000005) at address : SymInitializeW: The parameter is incorrect. (RuntimeError)
0x["???", "???", "???"]
1f565411000
  from ???
  from ???[0x7ff73a44789c] 
  from ???
  from ???
???
  from ???
  from [0x1fffffff0000000d] ???
[0x7ff73a446163] ???
???
[["???"]
0xd00000000] ???
[0x52933d3fd4da2e04] ???
[0xe9fe19dd54ddbdb6] ???
[0x7ff73a4475e6] ???
[0x1f5653d0d00] ???
[0x6258e96d0000000d] ???
[0x1a36ad96b7e88ea] ???
[0x1f5653dcd80] ???
[0xab200000001] ???
[0x100000001] ???
[0x1] ???
[0x1f5653dcd80] ???
[0x97980500979805] ???
[0x1f5653d0d00] ???

This is also reproducible with fibers under -Dpreview_mt:

4.times do
  spawn do
    puts "#{caller.inspect}\n"
  end
end
puts "#{caller.inspect}\n"
sleep 0.1

All related Win32 functions are single-threaded; replacing Exception::CallStack.@@sym_loaded with an Atomic::Flag is not enough, because those alternate threads still end up calling LibC.StackWalk64 to unwind their stacks before LibC.SymInitializeW returns. Therefore, all threads must await the completion of Exception::CallStack.load_debug_info_impl on whatever thread it is called.

To achieve this, #14905 or something like InitOnceExecuteOnce should be used instead.

HertzDevil avatar Aug 30 '24 08:08 HertzDevil

What about a mutex in the meantime?

ysbaddaden avatar Aug 30 '24 10:08 ysbaddaden

Loading symbols is also not thread-safe even after they have been initialized:

caller
Array.new(4) do
  Thread.new do
    100.times { caller }
  end
end.each(&.join)
# `-Dpreview_mt`
require "wait_group"

caller
WaitGroup.wait do |wg|
  4.times do
    wg.spawn do
      100.times { caller }
    end
  end
end

Eventually Exception::CallStack.decode_line_number will return bogus strings that include null characters at the end. So much of Exception::CallStack#decode_backtrace probably needs to be in a mutex as well.

HertzDevil avatar Aug 31 '24 00:08 HertzDevil

On MinGW-w64 the snippets may still produce ???, but never break with an access violation, probably due to the DWARF parser being written in Crystal

HertzDevil avatar Mar 24 '25 16:03 HertzDevil

I added a workflow to test sync on Windows, and the error indeed happens as soon as I specify -Dpreview_mt:

Unable to load debug information: SymInitializeW: The parameter is incorrect. (RuntimeError)

See for example https://github.com/ysbaddaden/sync/actions/runs/18980770977/job/54212434606

I'll have a look.

ysbaddaden avatar Nov 01 '25 10:11 ysbaddaden

There are multiple potential solutions:

  1. The immediate fix is indeed to put a mutex around init and decode. It creates a contention points when we print backtraces. That should usually be fine, but it might become an issue when somethings starts failing, and the application prints lots of exceptions?

  2. Reading the DbgHelp documentation, I see the example dups the GetCurrentProcess() handle, maybe we would dup a "debug handle" per thread, with a finalizer to cleanup symbols, then each thread would use their local debug handle for searching symbols? That might take some memory if it must load and duplicate the symbol tables for each thread.

  3. ~~Reading this snippet in Rust, it initializes symbols once then loads all the symbol table into Rust memory. I didn't dig further, but I guess that once we have the symbols, we can search ourselves, and do so safely without any synchronization anymore?~~ nope, it loads the list of individual modules, then uses the DbgHelp API to resolve symbols, and they use a mutex to decode the backtrace.

That bit is interesting, though:

Note however that std specifically bypasses this lock, and uses the *_unsynchronized backtrace APIs. This is "fine" because it wraps its own calls to backtrace in a non-reentrant Mutex that prevents two backtraces from getting interleaved during printing.

https://github.com/rust-lang/backtrace-rs/blob/f2bcf65c7830b4e2cf8495553b20afd305986a51/src/lib.rs#L213-L216

ysbaddaden avatar Nov 01 '25 10:11 ysbaddaden

It seems that DbgHelp is always single threaded, so even unwinding the stack with StackWalk64 needs to grab the lock. See for example https://jpassing.com/2008/03/12/walking-the-stack-of-the-current-thread/

Edit: ☝️ I can't reach a crash in practice though.

Note: I see that Rust's backtrace-rs uses RtlVirtualUnwind on Win64. I found a blog post series that talks about it and the higher level RtlCaptureStackBackTrace, then decodes the backtrace with DbgHelp. The Rtl API is apparently much faster, but I can't tell if it's thread safe.

ysbaddaden avatar Dec 15 '25 17:12 ysbaddaden