unicorn
unicorn copied to clipboard
siglongjmp crashes in .NET managed/unmanaged transitions on Windows x64
Hey. I've encountered a rather weird behaviour fixed in an even weirder way and I would really love to understand what's going on.
I'm using Unicorn in a .NET 6 project (using a custom wrapper, but that's not really important here). It worked flawlessly on both Windows and Linux until one day when I pulled the latest version of Unicorn on my Windows machine and rebuilt it. It started crashing in cpu_loop_exit()
in cpu-exec-common.c
, on the siglongjmp
call (an attempt was made to assign an invalid register context). I didn't have time to keep digging back then so what I tried was to build it as an x86 library and use that. This worked.
By now, I got to the point where I really needed it to run on x64 Windows. I discovered that it stopped working with 2912cd1e. That commit changes CMakeLists.txt, thus the building process, in some way that breaks the long jumps in my very specific scenario. When using CMakeLists.txt from 88f4eba (even in the latest commit in dev branch), everything works just fine.
I'm not that familiar with CMake and I'm quite confused by the Unicorn's configuration so that's what I'm asking: how/why did that build configuration change cause this?
For the record, with Unicorn built using msys2 (mingw-w64-x86_64-toolchain/Ninja), it exhibits the same faulty behaviour (even with the old CMakeLists). Using both the VS generator (in x64 mode: cmake .. -G "Visual Studio 16 2019" -A x64 -DCMAKE_BUILD_TYPE=Release
+ msbuild unicorn.sln -p:Plaform=x64 -p:Configuration=Release
) and NMake generator (as presented in COMPILE) works (with the old CMakeLists). The C compiler is MSVC 19.29.30137.0; the version of MASM is 14.29.30137.0, Windows 11 (10.0.22000.739). As far as I can tell, setjmp-wrapper-win32.asm
has always been included in the build, even in the ones producing the faulty behaviour.
Also, what makes it even weirder, it doesn't always crash and I haven't been able to come up with a specific set of cases in which it does and does not: for example, it seems to always crash in the native->managed transition in an interrupt hook; however, code hooks (using exactly the same marshalling technique) seem to work fine.
I am convinced this is somehow related to the magic happening on transitions between managed and unmanaged code in .NET. Running a similar code written in C++ works just fine even when using the newer CMakeLists on x64 Windows. But again, it doesn't crash always so idk. Also, I suppose this is related to #1331.
If anyone wants to do some digging over this, I'll be happy to provide an example.
Hello, so does it still crash in the latest dev branch?
Yes, the behaviour I'm describing occurs in all revisions since the CMakeLists change in 2912cd1.
Example: build Unicorn (the latest version in the dev branch), clone this repo, change the LibName
const in src/SharpCorn/Native.cs
to the path of your unicorn.dll and run the Test
project (all on Windows x64 and .NET 6.0.x).
Expected behaviour: the program finished successfuly with the last lines of output being CODE at 1017c \n INTERRUPT \n END
.
Actual behaviour: the program crashes right after CODE at 1017c
(before invoking the interrupt hook).
I've just tried it on a different, Windows 10 machine and the results are the same.
Apparently, this comes down to the MSVC runtime library. I suppose that before 2912cd1, the default was /MT
, the multi-threaded statically-linked library (as controlled by UNICORN_STATIC_MSVCRT
, which was set to PROJECT_IS_TOP_LEVEL
by default).
That commit changed CMakeLists so that the used runtime library is controlled by CMAKE_C_FLAGS
and it doesn't set any default value explicitly (?); and it defaults to the dynamically-linked MSVC runtime library (/MD
). When I modified the current CMakeLists so that it enforces the statically-linked library, it works.
So now we know where the issue comes from. The question is: why does the siglongjmp crash happen when using the dynamically linked library but not with the statically linked; and why does it only happen in the .NET environment (and only sometimes)?
I found my previous reply via email is missing… In case you didn’t see it, I copied it here:
That's really weird. Could you provide some way to reproduce? Another workaround is to cross compile x64 dll on Linux, which uses mingw and is exactly how qemu officially does, though may break your workflow.
For the linkage difference, my assumption is that the implementation of the siglongjmp is different so when it’s dynamically linked something craps. If you are familiar with windows debugging, could you post the siglongjmp implementation used in your case? You can just use VS to attach to it and check. Currently I don’t have access to any Windows build environment unfortunately.
I've tried both cross-compiling the library on Linux using mingw and building it on Windows using mingw in msys2. Both of these exhibit the same faulty behaivour; so really, the only configuration that works is using the statically-linked MultiThreaded /MT
runtime lib.
Fun fact: it doesn't even work when using the MultiThreadedDebug /MTd
statically-linked runtime library :)
Steps for reproduction are in my comment above. I also made a self-contained release of the testing code with all the various builds of the library so to observe the behaviour, available here.
I've tried both cross-compiling the library on Linux using mingw and building it on Windows using mingw in msys2. Both of these exhibit the same faulty behaivour; so really, the only configuration that works is using the statically-linked MultiThreaded
/MT
runtime lib.Fun fact: it doesn't even work when using the MultiThreadedDebug
/MTd
statically-linked runtime library :)Steps for reproduction are in my comment above. I also made a self-contained release of the testing code with all the various builds of the library so to observe the behaviour, available here.
This is really weird but doesn't surprise me as it's win32 ;).
Anyway, spare me a few days to install a virtual machine to test it. Thanks for your feedback!
btw, would you like to submit your .NET bindings to our repo?
btw, would you like to submit your .NET bindings to our repo?
I was absolutely planning to offer you just that. However, as I'm working on my thesis atm, I don't have the time to make the necessary final touches (such as adding constants for all target platforms, packaging and documenting the whole thing) so it'll have to wait until early September.
I tested your reproduction and it seems that it relates to this option:
https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-control-flow-guard?view=msvc-170
Unicorn by default won't enable this option so my wild guess is that it's enabled by the Windows loader for .net programs. Therefore, could you
- Add
/guard:cf-
and/GUARD:NO
to the proper places to ensure the CFG is disabled[1]. - Check if your C# program enables CFG.
[1] https://stackoverflow.com/questions/68487710/how-do-you-correctly-use-guardcf-msvc-flag
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days.
That's really weird. Could you provide some way to reproduce? Another workaround is to cross compile x64 dll on Linux, which uses mingw and is exactly how qemu officially does, though may break your workflow.
From: Ondřej Ondryáš @.> Sent: Tuesday, June 21, 2022 2:11:42 PM To: unicorn-engine/unicorn @.> Cc: lazymio @.>; Comment @.> Subject: Re: [unicorn-engine/unicorn] siglongjmp crashes in .NET managed/unmanaged transitions on Windows x64 (Issue #1631)
Yes, the behaviour I'm describing occurs in all revisions since the CMakeLists change in 2912cd1https://github.com/unicorn-engine/unicorn/commit/2912cd1e299456e71f9fc52b046d84cf1aff2144.
— Reply to this email directly, view it on GitHubhttps://github.com/unicorn-engine/unicorn/issues/1631#issuecomment-1161307287, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHJULO2OCGZGJXDKFIXMJ3DVQFMJ5ANCNFSM5ZJUC33Q. You are receiving this because you commented.Message ID: @.***>
I have the same issue. It was indeed a CFG issue. When, longjmp
is called, the longjmp
function in VCRUNTIME140.dll
is called, which has CFG enabled. And, if Frame
is zero in the jmp_buf
when CFG is enabled, it makes a crash.
https://github.com/ojdkbuild/tools_toolchain_vs2017bt_1416/blob/master/VC/Tools/MSVC/14.16.27023/crt/src/vcruntime/jbcxrval.c#L147
The Frame
was set to zero with the following context in Unicorn. I haven't read the post in details, but it seems like it was a mitigation of another crash.
https://blog.lazym.io/2020/09/21/Unicorn-Devblog-setjmp-longjmp-on-Windows/
dotnet enables CFG by default, and it seems like there is no way of disabling it in the dotnet build process. I could remove the CFG flag later after generating an exe file, but it's super inconvinient.
link /EDIT /GUARD:no .\path\to\exe
I see that this issue is gone with the static linking (/MT
). I guess that makes Unicorn to use the static-linked longjmp
which does not have CFG enabled. Can we just use the static linked unicorn dll? or at least for the dotnet binding?