Add SIGTERM support
:scroll: Description
Added support for catching the SIGTERM signal.
:bulb: Motivation and Context
All Apple OS's send SIGTERM (like any good unix based system) when they want to request a graceful termination of an application, when the app doesn't quit in those circumstances, it'll receive a SIGKILL. This often gives us the opportunity to get a stack trace where we know Apple wants the app to be terminated but don't know why (watchdog events). This is just one more piece of information that can be added to the puzzle of figuring out unexplained app terminations.
:green_heart: How did you test it?
I ran the tests as well as Tested on a real app.
:pencil: Checklist
You have to check all boxes before merging:
- [ ] I reviewed the submitted code.
- [ ] I added tests to verify the changes.
- [ ] No new PII added or SDK only sends newly added PII if
sendDefaultPIIis enabled. - [ ] I updated the docs if needed.
- [ ] Review from the native team if needed.
- [ ] No breaking change or entry added to the changelog.
- [ ] No breaking change for hybrid SDKs or communicated to hybrid SDKs.
:crystal_ball: Next steps
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 90.604%. Comparing base (
1267cb0) to head (4932e35).
Additional details and impacted files
@@ Coverage Diff @@
## main #3895 +/- ##
=============================================
- Coverage 90.815% 90.604% -0.211%
=============================================
Files 590 589 -1
Lines 45946 45854 -92
Branches 16380 16273 -107
=============================================
- Hits 41726 41546 -180
- Misses 4040 4217 +177
+ Partials 180 91 -89
| Files | Coverage Δ | |
|---|---|---|
| Sources/SentryCrash/Recording/SentryCrashDoctor.m | 55.597% <100.000%> (+2.012%) |
:arrow_up: |
| ...entryCrash/Recording/Tools/SentryCrashSignalInfo.c | 100.000% <ø> (ø) |
|
| ...ntryTests/SentryCrash/SentryCrashDoctorTests.swift | 100.000% <100.000%> (ø) |
... and 42 files with indirect coverage changes
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 1267cb0...4932e35. Read the comment docs.
@armcknight is there anything you need me to do to get this on in?
I'm just running the other tests really quick in https://github.com/getsentry/sentry-cocoa/pull/3946/, and then we can merge this! Thanks @naftaly for your patience.
What is this telling me when I get this on iOS and tvOS? Is this a good or a bad thing?
What would Sentry have been reporting otherwise? The Watchdog event?
What is this telling me when I get this on iOS and tvOS? Is this a good or a bad thing?
It’s the OS trying to tell you that it wants apps to exit gracefully. I’ve seen it happen in many different scenarios. Some of the interesting ones are when the device is shutdown, when an app will be terminated to update it and when something changes in traits that might require the OS to restart apps.
What’s interesting is that previously this would all go unnoticed to us the developers. If these happen when the user can see them, or perceive them, then we have an issue that need to be resolved. If they happen in the background, then we just hope that state restauration is well implemented and users don’t notice it.
What would Sentry have been reporting otherwise? The Watchdog event?
Likely nothing, it would have gone unnoticed.
We've always had a large number of WatchdogTermination events that didn't correspond to memory issues. Could it have been that these were actually receiving SIGTERM and it was not handled/reported and thus turned into the WatchdogTermination event?
We've always had a large number of
WatchdogTerminationevents that didn't correspond to memory issues. Could it have been that these were actually receiving SIGTERM and it was not handled/reported and thus turned into theWatchdogTerminationevent?
I guess it’s possible since the Watchdog code in Sentry (if I’m not mistaken) simply returns at the end of its heuristics and calls it an OOM/Watchdog event. So if the actual signal isn’t caught, then that might be the default.
FWIW, I’ve also proposed a PR to add real OOM recognition so that could help with other false positives. It might end up in KSCrash first though which is what Sentry uses for crash reporting.
@eric if you’re getting SIGTERM’s now, it would be interesting to know what type of event is going down (if any). That would help understand exactly what is happening or will happen for your app.
I just received the first SIGTERM on our Testflight beta today, and so wasn't sure what to do with it... it didn't really feel actionable.
We do currently send memory usage with all of our breadcrumbs to try to give us a better idea of where we have actual memory issues. It always seemed silly to me that Sentry couldn't at least sample the memory usage once per second and store it somewhere to give a more informed message related to the Watchdog events.
I just received the first SIGTERM on our Testflight beta today, and so wasn't sure what to do with it... it didn't really feel actionable.
If it’s background and it feels unactionable, then I would not prioritize it, but still keep in mind it’s there and happening. If it’s foreground, then I usually start adding breadcrumbs to make it actionable if the stack isn’t enough to give you an idea of why it is going on. These issues are definitely going to harder than your run of the mill exception since these happen based on what happened in the past vs. A mistake that might have happened at that point in time. If you can share the stack (of all threads) then I would be happy to take a look and see if I spot anything.
It doesn't seem like much of anything is happening here: https://fancy-bits-llc.sentry.io/issues/5386894502/
What is the signal that is sent when the OS decides that it's time for your app to stop running in the background when it needs to get more memory? Is SIGTERM ever going to be sent during normal operation?
It doesn't seem like much of anything is happening here: https://fancy-bits-llc.sentry.io/issues/5386894502/
I don’t have access. Maybe copy it to a gist or something.
What is the signal that is sent when the OS decides that it's time for your app to stop running in the background when it needs to get more memory?
That would be SIGKILL, we can’t catch it. Sometimes, if we’re really lucky, a SIGTERM comes before it, but I’ve not seen it often.
Is SIGTERM ever going to be sent during normal operation?
Depends what normal operation is, but yes, it can happen at any time. It’s a vestige of unix where it’s meant to tell apps to do their cleanup and exit otherwise be “exited”.
We do currently send memory usage with all of our breadcrumbs to try to give us a better idea of where we have actual memory issues.
We have that planned here: https://github.com/getsentry/sentry-cocoa/issues/3518#issuecomment-1909911490, but we haven't gotten to it yet, sorry.
It always seemed silly to me that Sentry couldn't at least sample the memory usage once per second and store it somewhere to give a more informed message related to the Watchdog events.
Yes, that makes sense. I created an issue. Thanks for the input 👏 , @eric. https://github.com/getsentry/sentry-cocoa/issues/4003