sentry-cocoa icon indicating copy to clipboard operation
sentry-cocoa copied to clipboard

Xcode organizer reports crash on `writeBinaryImage`

Open jaroslavas opened this issue 3 years ago • 24 comments

Platform

iOS

Installed

CocoaPods

Version

7.16.0

Steps to Reproduce

Add sentry to app and start reporting issues.

SentrySDK.start { (options) in
            options.dsn = dsn
            options.debug = false
            options.enableAutoSessionTracking = false
            options.attachStacktrace = false
            options.maxBreadcrumbs = 25

            options.beforeSend = { event in
                // Some data is removed to make reports lighter
                event.debugMeta = nil
                return event
            }
        }

Expected Result

Crashes should be seen in sentry as well as in XCode organizer.

Actual Result

XCode organizer starts showing a lot of different crashes as one, where Sentrys writeBinaryImage is shown as the cause for the crash. Screenshot 2022-06-16 at 10 11 22 Screenshot 2022-06-16 at 10 11 46

jaroslavas avatar Jun 16 '22 07:06 jaroslavas

What is the error message of the crash, @jaroslavas? When does this happen? Do you maybe have a repro for us?

philipphofmann avatar Jun 17 '22 09:06 philipphofmann

Nope, I only have different stack traces ending in a crash by sentry and binary image, either writing to it or reading: Screenshot 2022-06-17 at 12 43 47 Screenshot 2022-06-17 at 12 43 59

Unfortunately I can't reproduce this, only see a lot of reports in organizer from live users.

jaroslavas avatar Jun 17 '22 09:06 jaroslavas

a lot of reports in organizer from live users.

What exactly do you mean by that?

philipphofmann avatar Jun 17 '22 12:06 philipphofmann

It means crash reports downloaded from iTunes Connect.

brustolin avatar Jun 20 '22 05:06 brustolin

Did this start happening in 7.16.0 or did you see this problem for earlier versions as well, @jaroslavas?

philipphofmann avatar Jun 20 '22 13:06 philipphofmann

We have not used it for some time, but I'm almost sure it was there before 7.16.0

jaroslavas avatar Jun 22 '22 09:06 jaroslavas

It seems like we can't handle a signal properly, and we crash in writeBinaryImage. Do you maybe have any clue which type of crash this could be?

@armcknight, can you look at the stacktrace above, and do you maybe have any clue what could cause a crash when calling into _dyld_get_image_header in sentrycrashdl_getBinaryImage?

philipphofmann avatar Jun 22 '22 15:06 philipphofmann

@jaroslavas Could you provide a full crash report for some of these examples? There should be a thread that is indicated as the "Crashed thread", and maybe other info like an exception type or backtrace.

XCode organizer starts showing a lot of different crashes as one, where Sentrys writeBinaryImage is shown as the cause

My guess is that since our symbols show up in crash reports (which is normal; our crash handler logic is running to collect crash info) Xcode sees us as a common element amongst the reports and is blaming us. In practice, automatically choosing a frame to blame for a crash report is an inexact art.

armcknight avatar Jun 22 '22 20:06 armcknight

@armcknight Can I send them to the email in your profile?

jaroslavas avatar Jun 29 '22 08:06 jaroslavas

Yes, you can send them to [email protected].

philipphofmann avatar Jun 29 '22 08:06 philipphofmann

Interesting finding: I went through several of crashes with different stacks, but all them had one thing in common: app was killed by the watchdog with infamous 0x8BADF00D code.

jaroslavas avatar Jun 29 '22 08:06 jaroslavas

As you pointed out the crash is a watchdog termination. The crash reports you sent me contain

72180 exhausted real (wall clock) time allowance of 10.00 seconds
ProcessVisibility: Background
ProcessState: Running
WatchdogEvent: scene-update
WatchdogVisibility: Foreground

So it seems like the app gets killed because it hangs when being in the background.

0   libsystem_kernel.dylib        	0x00000001b7c91138 __ulock_wait + 8
1   libsystem_platform.dylib      	0x00000001f0df39ac _os_unfair_lock_lock_slow + 172 (lock.c:578)
2   dyld                          	0x00000001027406f4 dyld4::RuntimeState::withLoadersReadLock(void () block_pointer) + 76 (DyldRuntimeState.cpp:138)
3   dyld                          	0x000000010274697c dyld4::APIs::_dyld_get_image_header(unsigned int) + 136 (DyldAPIs.cpp:222)
4   Sentry                        	0x00000001047dd754 sentrycrashdl_getBinaryImage + 28 (SentryCrashDynamicLinker.c:345)
5   Sentry                        	0x00000001047ec748 writeBinaryImage + 32 (SentryCrashReport.c:1191)
6   Sentry                        	0x00000001047ec748 writeBinaryImages + 124 (SentryCrashReport.c:1234)
7   Sentry                        	0x00000001047ec748 sentrycrashreport_writeStandardReport + 460 (SentryCrashReport.c:1752)
8   Sentry                        	0x00000001047d9ca4 onCrash + 176 (SentryCrashC.c:107)
9   Sentry                        	0x00000001047e5ac8 0x1047c4000 + 137928 (SentryCrashMonitor.c:241)
10  Sentry                        	0x00000001047e8074 handleSignal + 336 (SentryCrashMonitor_Signal.c:108)
11  libsystem_platform.dylib      	0x00000001f0df4c10 _sigtramp + 56 (sigtramp.c:116)

It seems like the SentrySDK tries to write a crash report for a signal it received and it gets stuck. I'm not sure what is causing this. Worth checking some forks of KSCrash to see if they updated some logic in files similar to SentryCrashReport, SentryCrashDynamicLinker, and SentryCrashC.

Thanks for reporting this @jaroslavas. BTW, if you can come up with a repro for us this would be awesome 👏.

philipphofmann avatar Jun 30 '22 13:06 philipphofmann

Looking again at the stack trace, the app has already crashed for some reason, as our signal handler is running (onCrash). However, the fact that _dyld_get_image_header takes a lock is dangerous; you can only make reentrant calls from within a signal handler, and taking an unfair lock is not reentrant. So, the system is terminating the app with the 0x8BADF00D watchdog code because we've deadlocked and the app is taking too long to terminate.

SentryCrash (and KSCrash, in turn) need to avoid the _dyld call by gathering the list of loaded binary images beforehand. This can be done with _dyld_register_func_for_add_image and _dyld_register_func_for_remove_image. Then, if/when a crash occurs, we already have the list of loaded images and don't have to query dyld in the handler function.

armcknight avatar Jul 01 '22 06:07 armcknight

@jaroslavas, can you tell us the percentage of crashes from the Xcode organizer that crashed on writeOnBinaryImage of the total amount of crashes of the Xcode organizers, so we know how bad this is? Does our SDK miss other crashes as well that the Xcode organizer has? If yes, again could you tell us how many, please?

philipphofmann avatar Jul 06 '22 15:07 philipphofmann

This could be related to https://github.com/getsentry/sentry-cocoa/issues/1864.

philipphofmann avatar Jul 07 '22 08:07 philipphofmann

@jaroslavas, can you tell us the percentage of crashes from the Xcode organizer that crashed on writeOnBinaryImage of the total amount of crashes of the Xcode organizers, so we know how bad this is?

Actually, most of the crashes are from sentry atm :) Here is a screenshot from TestFlight, version on the bottom has sentry enabled, version on the top has it disabled: Screenshot 2022-07-11 at 11 15 03

jaroslavas avatar Jul 11 '22 08:07 jaroslavas

Thanks for the update, @jaroslavas. I would be surprised if this would be the case for most apps. I'm trying to understand what could cause this, although we already know how to fix it. Do you have any other crash reporting tools installed in parallel or another library that hooks into signal handlers?

philipphofmann avatar Jul 11 '22 13:07 philipphofmann

We only use Sentry and reports provided by Apple automatically.

jaroslavas avatar Jul 11 '22 13:07 jaroslavas

Again thanks for the update, @jaroslavas.

philipphofmann avatar Jul 12 '22 11:07 philipphofmann

Might also fix https://github.com/getsentry/sentry-cocoa/issues/1979.

philipphofmann avatar Jul 18 '22 09:07 philipphofmann

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

github-actions[bot] avatar Aug 09 '22 00:08 github-actions[bot]

So, is there a plan to fix it before bot closes the issue? :)

jaroslavas avatar Aug 09 '22 05:08 jaroslavas

Yes, we definitely plan to fix this. We had a minor set back that delayed this, but we're working to have it fixed.

brustolin avatar Aug 09 '22 07:08 brustolin

As the message says we need a backlog label to avoid the stale bot

bruno-garcia avatar Aug 09 '22 11:08 bruno-garcia

Any updates on this issue?

jaroslavas avatar Oct 18 '22 09:10 jaroslavas

No, not yet, @jaroslavas. Sorry, we didn't have the bandwidth yet to get to that issue.

philipphofmann avatar Oct 18 '22 16:10 philipphofmann

Hi @philipphofmann do you have any news? thank you

jacopom avatar Nov 24 '22 16:11 jacopom

Nope not yet; sorry, @jacopom.

philipphofmann avatar Nov 29 '22 16:11 philipphofmann

Fixed with #2939

brustolin avatar Jun 02 '23 08:06 brustolin

Hi @brustolin @philipphofmann We've tried to add back Sentry, but looks like the problem is still there. Watchdog kills the app after eating bad food :(

Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Termination Reason: FRONTBOARD 2343432205 
<RBSTerminateContext| domain:10 code:0x8BADF00D explanation:scene-update watchdog transgression: app<ch.protonmail.vpn(F71EC81F-DD3E-43E5-A342-4052E7631350)>:990 exhausted real (wall clock) time allowance of 10.00 seconds
ProcessVisibility: Foreground
ProcessState: Running
WatchdogEvent: scene-update
WatchdogVisibility: Foreground
WatchdogCPUStatistics: (
"Elapsed total CPU time (seconds): 8.600 (user 8.600, system 0.000), 16% CPU",
"Elapsed application CPU time (seconds): 0.008, 0% CPU"
) reportType:CrashLog maxTerminationResistance:Interactive>

And here is the main thread that crashed:

Thread 0 name:   Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0   libsystem_kernel.dylib        	       0x1f394dbf8 __ulock_wait + 8
1   libsystem_platform.dylib      	       0x2161562dc _os_unfair_lock_lock_slow + 176
2   dyld                          	       0x1cef7915c dyld4::RuntimeLocks::withLoadersReadLock(void () block_pointer) + 84
3   dyld                          	       0x1cef86a74 dyld4::APIs::_dyld_get_image_header(unsigned int) + 140
4   ProtonVPN                     	       0x104a3c338 sentrycrashdl_dladdr (in ProtonVPN) (SentryCrashDynamicLinker.c:215) + 1442616
5   ProtonVPN                     	       0x104a43a94 symbolicate_internal (in ProtonVPN) (SentryCrashSymbolicator.c:75) + 1473172
6   ProtonVPN                     	       0x104a38ee8 writeThread (in ProtonVPN) (SentryCrashReport.c:1089) + 1429224
7   ProtonVPN                     	       0x104a39e68 writeAllThreads (in ProtonVPN) (SentryCrashReport.c:1142) + 1433192
8   ProtonVPN                     	       0x104a39ae8 sentrycrashreport_writeStandardReport (in ProtonVPN) (SentryCrashReport.c:1717) + 1432296
9   ProtonVPN                     	       0x104a360d8 onCrash (in ProtonVPN) (SentryCrashC.c:95) + 1417432
10  ProtonVPN                     	       0x104a31020 sentrycrashcm_handleException (in ProtonVPN) (SentryCrashMonitor.c:235) + 1396768
11  ProtonVPN                     	       0x104a32fe8 handleSignal (in ProtonVPN) (SentryCrashMonitor_Signal.c:110) + 1404904
12  libsystem_platform.dylib      	       0x216158888 _sigtramp + 56
13  libobjc.A.dylib               	       0x1a45c38dc class_createInstance + 68
14  libobjc.A.dylib               	       0x1a45c38dc class_createInstance + 68
15  CoreFoundation                	       0x1ac23ae30 __CFAllocateObject + 20
16  CoreFoundation                	       0x1ac26aea4 __NSSetM_new + 132
17  CoreFoundation                	       0x1ac26a15c +[NSSet set] + 28
18  BaseBoard                     	       0x1b435ea78 -[BSKeyedSettings initWithSettings:] + 228
19  FrontBoardServices            	       0x1c46a0600 -[FBSSettings _settings:] + 120
20  FrontBoardServices            	       0x1c46a056c -[FBSSettings _rootSettingsForLocal:createIfNeeded:] + 44
21  FrontBoardServices            	       0x1c46a3890 -[FBSSettings _setValue:forSetting:] + 176
22  FrontBoardServices            	       0x1c46a119c -[FBSSettings(FBSSceneExtension) setValue:forProperty:] + 72
23  UIKitCore                     	       0x1ae9efa90 -[_UISceneHostingDiffAction _performActionsForUIScene:withUpdatedFBSScene:settingsDiff:fromSettings:transitionContext:lifecycleActionType:] + 232
24  UIKitCore                     	       0x1ae9010a8 __64-[UIScene scene:didUpdateWithDiff:transitionContext:completion:]_block_invoke.225 + 612
25  UIKitCore                     	       0x1ae57c454 -[UIScene _emitSceneSettingsUpdateResponseForCompletion:afterSceneUpdateWork:] + 216
26  UIKitCore                     	       0x1ae57c2c4 -[UIScene scene:didUpdateWithDiff:transitionContext:completion:] + 244
27  UIKitCore                     	       0x1ae57c104 -[UIApplicationSceneClientAgent scene:handleEvent:withCompletion:] + 336
28  FrontBoardServices            	       0x1c46ac5b4 -[FBSScene updater:didUpdateSettings:withDiff:transitionContext:completion:] + 660
29  FrontBoardServices            	       0x1c46ac300 __94-[FBSWorkspaceScenesClient _queue_updateScene:withSettings:diff:transitionContext:completion:]_block_invoke_2 + 152
30  FrontBoardServices            	       0x1c46ac19c -[FBSWorkspace _calloutQueue_executeCalloutFromSource:withBlock:] + 168
31  FrontBoardServices            	       0x1c46ac0b8 __94-[FBSWorkspaceScenesClient _queue_updateScene:withSettings:diff:transitionContext:completion:]_block_invoke + 344
32  libdispatch.dylib             	       0x1b420d300 _dispatch_client_callout + 20
33  libdispatch.dylib             	       0x1b4210d48 _dispatch_block_invoke_direct + 284
34  FrontBoardServices            	       0x1c46a8520 __FBSSERIALQUEUE_IS_CALLING_OUT_TO_A_BLOCK__ + 52
35  FrontBoardServices            	       0x1c46a84a0 -[FBSMainRunLoopSerialQueue _targetQueue_performNextIfPossible] + 240
36  FrontBoardServices            	       0x1c46a8378 -[FBSMainRunLoopSerialQueue _performNextFromRunLoopSource] + 28
37  CoreFoundation                	       0x1ac25112c __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 28
38  CoreFoundation                	       0x1ac2503a8 __CFRunLoopDoSource0 + 176
39  CoreFoundation                	       0x1ac24eb5c __CFRunLoopDoSources0 + 244
40  CoreFoundation                	       0x1ac24d898 __CFRunLoopRun + 828
41  CoreFoundation                	       0x1ac24d478 CFRunLoopRunSpecific + 608
42  GraphicsServices              	       0x1ef7a64f8 GSEventRunModal + 164
43  UIKitCore                     	       0x1ae67162c -[UIApplication _run] + 888
44  UIKitCore                     	       0x1ae670c68 UIApplicationMain + 340
45  ProtonVPN                     	       0x1048e2480 main (in ProtonVPN) (<compiler-generated>:0) + 25728
46  dyld                          	       0x1cef72dcc start + 2240

As you can see _dyld_get_image_header is still in the stack.

Is there any way we can help you debug and fix this?

jaroslavas avatar Feb 27 '24 08:02 jaroslavas