sentry-cocoa icon indicating copy to clipboard operation
sentry-cocoa copied to clipboard

New Memory Termination Handling (OOM)

Open naftaly opened this issue 1 year ago • 7 comments

:scroll: Description

This PR introduces memory pressure and memory level systems in order to effectively track memory terminations and correctly report them.

Example Report

Screenshot 2024-04-22 at 3 34 56 PM Screenshot 2024-04-22 at 3 35 08 PM

:bulb: Motivation and Context

A lot has changed since this article about handling memory terminations. In order to bring Sentry OOM handling up to date, some additional heuristics are needed.

At this point, this is more of a proposal than a full PR since a few things would need to be updated to land this. For example, SentryAppMemoryTracker should be move to the dependency container.

:green_heart: How did you test it?

Some new tests were added for new memory classes. More in depth testing is required as well as some changes to DI in order to allow for these tests.

:pencil: Checklist

You have to check all boxes before merging:

  • [ ] I reviewed the submitted code.
  • [ ] I added tests to verify the changes.
  • [ ] No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • [ ] I updated the docs if needed.
  • [ ] Review from the native team if needed.
  • [ ] No breaking change or entry added to the changelog.
  • [ ] No breaking change for hybrid SDKs or communicated to hybrid SDKs.

:crystal_ball: Next steps

naftaly avatar Apr 22 '24 19:04 naftaly

Hey @naftaly, thanks a lot for this PR. I like the idea of subscribing to memory pressure notifications to decide if a watchdog termination was an OOM. To merge this PR, we would require a couple of changes. We also need to investigate how reliable the new OOM is because we had plenty of confusion in the past for reporting OOMs that weren't OOMs. As you stated in the PR description, this is more of a proposal. Would you rather prefer guidance on getting this PR merged, or are you OK with us taking this as an inspiration and opening a new PR?

With this PR, we can also tackle https://github.com/getsentry/sentry-cocoa/issues/2514.

philipphofmann avatar Apr 23 '24 11:04 philipphofmann

I’d love some guidance on getting this PR merged, but I can also help you folks take inspiration and go from there, I can leave it up to you which way you prefer.

naftaly avatar Apr 23 '24 13:04 naftaly

I’d love some guidance on getting this PR merged, but I can also help you folks take inspiration and go from there, I can leave it up to you which way you prefer.

@naftaly, we will discuss this in our internal sync meeting this week and get back to you.

philipphofmann avatar Apr 23 '24 13:04 philipphofmann

@philipphofmann, sounds good. For a bit of info around how this works, check out the comments in SentryAppMemory.mm. I look forward to hearing back.

naftaly avatar Apr 23 '24 14:04 naftaly

Hey, @naftaly.

We decided on a call that we want to implement the proposals of this PR in increments. We definitely would like to use the memory pressure logic for adding breadcrumbs, which is https://github.com/getsentry/sentry-cocoa/issues/2514. Would you be up for changing this PR or opening a new PR for this or would you rather leave it to us, which is not a problem at all.

We are unsure if we want to use the memory pressure info to change the OOM logic because we first need to investigate if it's working reliably. Furthermore, we need to figure out how we will reflect this in the product. As this is a bit more complicated and requires SDK and Sentry knowledge, we think it's better to add the implementation for this, but we can't give you an ETA yet. I created a new issue for this: https://github.com/getsentry/sentry-cocoa/issues/3890

philipphofmann avatar Apr 24 '24 13:04 philipphofmann

@philipphofmann, all sounds good. I’ll let you folks take over both aspects, and if you need any help feel free to reach out.

When I worked at Meta, I implemented something similar to this in the termination pipeline and it changed how we worked with OOMs. The detection was almost perfect. False positives were a thing of the past, sometimes one would be missed but then we’d end up reverting to fall through behaviour where it becomes a general WD termination.

On the implementation of it all. The memory pressure basically acts the same as the didReceiveMemoryWarning notification while in the foreground. It’s in the background that it becomes useful and has much better precision.

Overall, we’ve discussed memory pressure, but the more important feature here is memory level. This is what is very novel to this PR (proposal), it actually gives you almost perfect knowledge surrounding the OS terminating due to getting close to the memory limit, which is the more likely scenario in day-to-day apps. You won’t get sudden huge allocation that kill the app, but this is much less common.

I look forward to seeing this in Sentry, it’s something that is not available in any of the other systems out there that are public and can change the way reliability teams work with OOMs (and prioritize issues).

naftaly avatar Apr 24 '24 13:04 naftaly

Thanks for the detailed update and all the input @naftaly 🥇.

philipphofmann avatar Apr 24 '24 15:04 philipphofmann

We are going to pull code from this proposal once we tackle https://github.com/getsentry/sentry-cocoa/issues/3890, but I'm going to close this for now, as we aren't actively working on this right now.

philipphofmann avatar May 08 '24 13:05 philipphofmann