sentry-cocoa Improve App Hangs detection

Description

With the current App Hangs timeout implementation, it is possible for an App Hangs to last "timeout * 1.9" without being reported.

See https://github.com/getsentry/sentry-cocoa/pull/1861#discussion_r890099824

Unity implementation

Jun 07 '22 07:06 brustolin

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

Jun 30 '22 00:06 github-actions[bot]

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

Aug 12 '22 00:08 github-actions[bot]

@brustolin I really don't understand this one. Philip tried to explain it to me but maybe you can try to explain it?

Aug 12 '22 07:08 kevinrenskers

@brustolin I really don't understand this one. Philip tried to explain it to me but maybe you can try to explain it?

For sure. We talk about this in a meeting next week ;)

Aug 12 '22 07:08 brustolin

Because the direct link to the comment doesn't work anymore, I'm just going to repeat what @vaind said in that PR:

Note: while this hasn't changed in this PR, I wanted to point out the issue with this approach (the same one is present in the Java watchdog AFAIK:

Sleeping for a given duration before checking a flag means the timeout is not really a "timeout", but merely a baselin, with the actual UI non-responsive time necessary for the ANR to be reported ranging between timeoutInterval & timeoutInterval * 2, averaging at about timeoutInterval * 1.5. That's because the following happens:

Assume timeoutInterval = 2 seconds, i.e. the current default. Also let's call blockExecutedOnMainThread just flag for brewity:

flag= NO is executed and a block to reset is is scheduled
say 0.1 second later, the flag = YES is executed on the main thread because UI still works
say another 0.1 second later, UI hangs. Current time is start+0.2 seconds
at start+2 seconds the watchdog thread wakes up, sees everything working just fine and looping. UI is still stuck at this point but flag == YES...
in another iteration, set flag = NO
now the UI starts being responsive after a total of 3.7 second unresponsiveness, with the current time being at start + 3.9 second, i.e. just before the next check. Sets flag = YES, since UI works now again.
next check happens after the two-second sleep, all nice and shiny again because flag == YES - no ANR detected.

I'm not saying this is a huge deal, just pointing out the timeout isn't really a timeout... That's why I've changed this in the Unity implementation (after realizing this issue because my tests were spuriously failing), to executing the background iteration more often (5 times more often, currently) and counting the number of iterations (ticks) since the UI last responded. Then if the count reaches the predetermined count (5), the report is sent. Still not exact, but reduces the average ANR time before report from timetout*1.5 to timeout * 1.1, but more importantly, it also reduces the maximum time when an ANR isn't detected to about timeout*1.2 instead of timeout*1.9999...

Aug 16 '22 08:08 kevinrenskers

sentry-cocoa sentry-cocoa copied to clipboard

Improve App Hangs detection

Description

sentry-cocoa
sentry-cocoa copied to clipboard