libobjc2 icon indicating copy to clipboard operation
libobjc2 copied to clipboard

Exception handling on Windows causing access violation with ARC

Open triplef opened this issue 2 years ago • 6 comments

Throwing/catching an exception in ARC-enabled code on Windows causes an access violation. I’ve reproduced this in various environments/projects and both x64 and x86 with the same results. While debug builds will usually crash almost immediately, release builds can sometimes continue execution for a bit before crashing.

I also tried using Application Verifier and Page Heap to find the cause, but unfortunately I never got any more useful info. The stack trace is usually corrupt and/or unhelpful.

This can for example be reproduced with the ObjCXXEHInterop_arc test, which is currently disabled for Windows due to the failure:

Exception thrown at 0x00007FFA9FC522C8 (objc.dll) in ObjCXXEHInterop_arc.exe: 0xE06D7363: Microsoft C++ Exception (parameters: 0x0000000019930520, 0x0000003160DFF6A8, 0x0000003160DFF5D8, 0x0000003160DFF6A7).
ObjCXXEHInterop_arc.exe has triggered a breakpoint.

ObjCXXEHInterop_arc.exe has triggered a breakpoint.

Exception thrown at 0x00007FFA9FC25280 (vcruntime140d.dll) in ObjCXXEHInterop_arc.exe: 0xC0000005: Access violation executing location 0x00007FFA9FC25280.

Any thoughts on how to debug this further would be appreciated.

triplef avatar Jan 18 '22 15:01 triplef

Normally I'd debug this kind of thing in WinDbg using its time-travel mode: get to the crash, look where the pointer to the faulting address came from, and then run in reverse with a watchpoint set to that address and see where it happened. It may be that we're failing to retain / release over exception throwing (though I thought the compiler did that in ARC mode?), but if you can find out what 0x00007FFA9FC25280 used to point to with the time-travel debugger then you can see where it was deallocated.

davidchisnall avatar Jan 18 '22 16:01 davidchisnall

That sounds like a good plan. Unfortunately I haven’t really used WinDbg before and are not really getting anywhere with it...

For starters, could you tell me how to find out where the pointer to the faulting address came from?

I used this tutorial as guideline and are getting a list of events that shows the C++ exception that is thrown directly followed by the access violation exception. It would be great if you could give me some pointers where to go from there.

One interesting tidbit maybe is that on x86 the faulting address is always 0x100:

    [0xf]            : Exception 0xE06D7363 of type CPlusPlus at PC: 0X76F68A80
    [0x10]           : Exception 0xC0000005 of type Hardware at PC: 0X100

Whereas on x64 it’s always some actual memory address:

    [0x10]           : Exception 0xE06D7363 of type CPlusPlus at PC: 0X7FFB3B901020
    [0x11]           : Exception 0x80000003 of type Hardware at PC: 0X7FF6C72C10EE

triplef avatar Jan 19 '22 16:01 triplef

We got this fixed in Clang 15: https://reviews.llvm.org/D128190

triplef avatar Sep 15 '22 14:09 triplef

I was about to propose a change that enabled the test on Windows when using Clang 15+ as the host compiler, when I realized that it still crashes! The nested try-catch in this test actually triggers another problem that was not covered in my fix from https://reviews.llvm.org/D128190

weliveindetail avatar Sep 29 '22 09:09 weliveindetail

Review for a fix/workaround candidate: https://reviews.llvm.org/D134866

weliveindetail avatar Sep 29 '22 10:09 weliveindetail

I was working on a PR to enable the ObjCXXEHInterop_arc tests on Windows in Clang 15+ when I realized that there is a second side-effect that we didn't fix with the above patch. It does affect the test here, so I didn't sent the PR yet. More info and the start of what may become a more general discussion on the concepts behind WinEH: https://reviews.llvm.org/D134866

weliveindetail avatar Oct 07 '22 14:10 weliveindetail

This was fully fixed in Clang 16.

triplef avatar Oct 04 '23 06:10 triplef