sentry-cocoa icon indicating copy to clipboard operation
sentry-cocoa copied to clipboard

Add SIGTERM support

Open naftaly opened this issue 2 years ago • 3 comments

:scroll: Description

Added support for catching the SIGTERM signal.

Screenshot 2024-04-24 at 12 45 08 PM Screenshot 2024-04-24 at 12 45 16 PM

:bulb: Motivation and Context

All Apple OS's send SIGTERM (like any good unix based system) when they want to request a graceful termination of an application, when the app doesn't quit in those circumstances, it'll receive a SIGKILL. This often gives us the opportunity to get a stack trace where we know Apple wants the app to be terminated but don't know why (watchdog events). This is just one more piece of information that can be added to the puzzle of figuring out unexplained app terminations.

:green_heart: How did you test it?

I ran the tests as well as Tested on a real app.

:pencil: Checklist

You have to check all boxes before merging:

  • [ ] I reviewed the submitted code.
  • [ ] I added tests to verify the changes.
  • [ ] No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • [ ] I updated the docs if needed.
  • [ ] Review from the native team if needed.
  • [ ] No breaking change or entry added to the changelog.
  • [ ] No breaking change for hybrid SDKs or communicated to hybrid SDKs.

:crystal_ball: Next steps

naftaly avatar Apr 24 '24 16:04 naftaly

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 90.604%. Comparing base (1267cb0) to head (4932e35).

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##              main     #3895       +/-   ##
=============================================
- Coverage   90.815%   90.604%   -0.211%     
=============================================
  Files          590       589        -1     
  Lines        45946     45854       -92     
  Branches     16380     16273      -107     
=============================================
- Hits         41726     41546      -180     
- Misses        4040      4217      +177     
+ Partials       180        91       -89     
Files Coverage Δ
Sources/SentryCrash/Recording/SentryCrashDoctor.m 55.597% <100.000%> (+2.012%) :arrow_up:
...entryCrash/Recording/Tools/SentryCrashSignalInfo.c 100.000% <ø> (ø)
...ntryTests/SentryCrash/SentryCrashDoctorTests.swift 100.000% <100.000%> (ø)

... and 42 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1267cb0...4932e35. Read the comment docs.

codecov[bot] avatar Apr 25 '24 18:04 codecov[bot]

@armcknight is there anything you need me to do to get this on in?

naftaly avatar May 06 '24 21:05 naftaly

I'm just running the other tests really quick in https://github.com/getsentry/sentry-cocoa/pull/3946/, and then we can merge this! Thanks @naftaly for your patience.

armcknight avatar May 06 '24 21:05 armcknight

What is this telling me when I get this on iOS and tvOS? Is this a good or a bad thing?

What would Sentry have been reporting otherwise? The Watchdog event?

eric avatar May 21 '24 22:05 eric

What is this telling me when I get this on iOS and tvOS? Is this a good or a bad thing?

It’s the OS trying to tell you that it wants apps to exit gracefully. I’ve seen it happen in many different scenarios. Some of the interesting ones are when the device is shutdown, when an app will be terminated to update it and when something changes in traits that might require the OS to restart apps.

What’s interesting is that previously this would all go unnoticed to us the developers. If these happen when the user can see them, or perceive them, then we have an issue that need to be resolved. If they happen in the background, then we just hope that state restauration is well implemented and users don’t notice it.

What would Sentry have been reporting otherwise? The Watchdog event?

Likely nothing, it would have gone unnoticed.

naftaly avatar May 21 '24 22:05 naftaly

We've always had a large number of WatchdogTermination events that didn't correspond to memory issues. Could it have been that these were actually receiving SIGTERM and it was not handled/reported and thus turned into the WatchdogTermination event?

eric avatar May 21 '24 22:05 eric

We've always had a large number of WatchdogTermination events that didn't correspond to memory issues. Could it have been that these were actually receiving SIGTERM and it was not handled/reported and thus turned into the WatchdogTermination event?

I guess it’s possible since the Watchdog code in Sentry (if I’m not mistaken) simply returns at the end of its heuristics and calls it an OOM/Watchdog event. So if the actual signal isn’t caught, then that might be the default.

FWIW, I’ve also proposed a PR to add real OOM recognition so that could help with other false positives. It might end up in KSCrash first though which is what Sentry uses for crash reporting.

naftaly avatar May 21 '24 22:05 naftaly

@eric if you’re getting SIGTERM’s now, it would be interesting to know what type of event is going down (if any). That would help understand exactly what is happening or will happen for your app.

naftaly avatar May 21 '24 22:05 naftaly

I just received the first SIGTERM on our Testflight beta today, and so wasn't sure what to do with it... it didn't really feel actionable.

eric avatar May 21 '24 22:05 eric

screenshot-2024-05-21-15 55 38@2x

We do currently send memory usage with all of our breadcrumbs to try to give us a better idea of where we have actual memory issues. It always seemed silly to me that Sentry couldn't at least sample the memory usage once per second and store it somewhere to give a more informed message related to the Watchdog events.

eric avatar May 21 '24 22:05 eric

I just received the first SIGTERM on our Testflight beta today, and so wasn't sure what to do with it... it didn't really feel actionable.

If it’s background and it feels unactionable, then I would not prioritize it, but still keep in mind it’s there and happening. If it’s foreground, then I usually start adding breadcrumbs to make it actionable if the stack isn’t enough to give you an idea of why it is going on. These issues are definitely going to harder than your run of the mill exception since these happen based on what happened in the past vs. A mistake that might have happened at that point in time. If you can share the stack (of all threads) then I would be happy to take a look and see if I spot anything.

naftaly avatar May 21 '24 22:05 naftaly

It doesn't seem like much of anything is happening here: https://fancy-bits-llc.sentry.io/issues/5386894502/

What is the signal that is sent when the OS decides that it's time for your app to stop running in the background when it needs to get more memory? Is SIGTERM ever going to be sent during normal operation?

eric avatar May 21 '24 23:05 eric

It doesn't seem like much of anything is happening here: https://fancy-bits-llc.sentry.io/issues/5386894502/

I don’t have access. Maybe copy it to a gist or something.

What is the signal that is sent when the OS decides that it's time for your app to stop running in the background when it needs to get more memory?

That would be SIGKILL, we can’t catch it. Sometimes, if we’re really lucky, a SIGTERM comes before it, but I’ve not seen it often.

Is SIGTERM ever going to be sent during normal operation?

Depends what normal operation is, but yes, it can happen at any time. It’s a vestige of unix where it’s meant to tell apps to do their cleanup and exit otherwise be “exited”.

naftaly avatar May 21 '24 23:05 naftaly

We do currently send memory usage with all of our breadcrumbs to try to give us a better idea of where we have actual memory issues.

We have that planned here: https://github.com/getsentry/sentry-cocoa/issues/3518#issuecomment-1909911490, but we haven't gotten to it yet, sorry.

It always seemed silly to me that Sentry couldn't at least sample the memory usage once per second and store it somewhere to give a more informed message related to the Watchdog events.

Yes, that makes sense. I created an issue. Thanks for the input 👏 , @eric. https://github.com/getsentry/sentry-cocoa/issues/4003

philipphofmann avatar May 22 '24 13:05 philipphofmann