test: Improve trigger fully blocking app hang
Add a second button to decode an image in a loop until 5 seconds have passed. This is a follow-up on GH-5413.
#skip-changelog
Performance metrics :rocket:
| Plain | With Sentry | Diff | |
|---|---|---|---|
| Startup time | 1225.37 ms | 1256.06 ms | 30.69 ms |
| Size | 23.75 KiB | 847.01 KiB | 823.26 KiB |
Baseline results on branch: main
Startup times
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 077e94086236b0be69cb0982b64053b94f2c9ea4 | 1227.31 ms | 1246.51 ms | 19.20 ms |
| 15b8c6111854bacf68624e765d99adcf93bfc987 | 1223.16 ms | 1244.83 ms | 21.67 ms |
| bc0c36e91c733f434511cb832b636fdc8ad90330 | 1220.41 ms | 1239.45 ms | 19.04 ms |
| 455619dd977fcfe15f8d8294a8d217fcb0b90d15 | 1231.40 ms | 1237.70 ms | 6.30 ms |
| f8833c443d047f0c8dac5710e52b4c0624fc52ce | 1236.45 ms | 1252.82 ms | 16.37 ms |
| 728804f93e9f74d591108dd8a8d2df61a645c02b | 1235.26 ms | 1254.20 ms | 18.94 ms |
| 6e342acf0107a6421fa9221f9ecd93c5f3ef0b3b | 1220.24 ms | 1240.14 ms | 19.90 ms |
| 46f5eb8f90236e0b538f5ff1c237e290c12a25e6 | 1200.09 ms | 1231.38 ms | 31.29 ms |
| 3912b16e2a203c2dd8155aa48aaf4c4dc10199e3 | 1230.54 ms | 1249.65 ms | 19.11 ms |
| 061c982c8616fa6ca7f0762998f39a8f4d8d2fc2 | 1226.33 ms | 1243.65 ms | 17.33 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 077e94086236b0be69cb0982b64053b94f2c9ea4 | 21.58 KiB | 706.97 KiB | 685.39 KiB |
| 15b8c6111854bacf68624e765d99adcf93bfc987 | 20.76 KiB | 419.67 KiB | 398.91 KiB |
| bc0c36e91c733f434511cb832b636fdc8ad90330 | 22.31 KiB | 775.27 KiB | 752.95 KiB |
| 455619dd977fcfe15f8d8294a8d217fcb0b90d15 | 20.76 KiB | 432.87 KiB | 412.11 KiB |
| f8833c443d047f0c8dac5710e52b4c0624fc52ce | 21.58 KiB | 422.66 KiB | 401.08 KiB |
| 728804f93e9f74d591108dd8a8d2df61a645c02b | 22.85 KiB | 411.76 KiB | 388.91 KiB |
| 6e342acf0107a6421fa9221f9ecd93c5f3ef0b3b | 20.76 KiB | 436.66 KiB | 415.90 KiB |
| 46f5eb8f90236e0b538f5ff1c237e290c12a25e6 | 20.76 KiB | 432.37 KiB | 411.61 KiB |
| 3912b16e2a203c2dd8155aa48aaf4c4dc10199e3 | 21.58 KiB | 625.82 KiB | 604.24 KiB |
| 061c982c8616fa6ca7f0762998f39a8f4d8d2fc2 | 21.58 KiB | 683.51 KiB | 661.93 KiB |
Previous results on branch: test/improve-trigger-fully-blocking-app-hang
Startup times
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 1e31c497da5d5632836dacb48dcd79d444342fb6 | 1226.42 ms | 1249.02 ms | 22.60 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 1e31c497da5d5632836dacb48dcd79d444342fb6 | 23.75 KiB | 846.86 KiB | 823.11 KiB |
Is a fixed time a better app hang use case than a fixed number of tasks?
The problem is that in CI the for loop with a fixed number of tasks can take way longer than 5 seconds. If CI is overloaded it can take x seconds. We only want the app to hang for 5 seconds.
I also updated the code comment to explain this.
But more broadly, why is this more realistic than calling a sleep? I don't think it's possible for any frames to be rendered while this loop is executing. It runs on the main run loop, so it's no different than sleeping the main run loop, right?
Please see my comment in the other PR why I do not think it's the same.
Also App Hang Detection is also used to detect if too much work is done on the main thread, e.g. a large image is decoded. This is a lot of busy work and not the same as sleep.
The test case Noah added seems like the simplest base test case approach IMO that satisfies what ours docs say for the fully-blocking case:
A fully-blocking app hang is when the main thread is stuck completely, and the app can't render a single frame.
The busy loop seems like another interesting but separate case to include, maybe we can keep both as @philipphofmann mentioned in the other discussion. If we had to only pick one to use for CI I'd vote for the sleep implementation though.
Agree with @trevor-e and @noahsmartin here. It doesn't matter what is holding up the main thread from shuffling off a new CA commit to the render server. If a customer had a sleep on the main thread, we should also detect that as an app hang. If anyone can come up with a test case where a sleep wouldn't be classified as a hang, then we have something to go on here. Otherwise it's a distinction without a difference.
Imagine if we simply mocked threads and run loops. In that case, we'd probably use a timer to fire updates from the mock main thread to the anr detector, or manipulate the date/time provider to fast-forward the system clock, which would all basically be the same thing as using sleep on the main thread itself.
Summary of internal Slack Discussion
When we built the app hang detection V1, we also investigated using CFRunLoopObserver with dispatch semaphores to detect app hangs. We identified that this solution doesn't work well when the run loop executes all events in time, but the app still seems stuck to the user. This approach didn't detect the for loop in this PR as an app hang. That's why we check if work on the main thread gets executed in time. That is app hang v1. The code docs explain this a bit: https://github.com/getsentry/sentry-cocoa/blob/ba8a46b7ac0863cf2c8e45b5386ffb4ad6c141d4/Sources/Sentry/include/SentryANRTrackerV1.h#L19-L22 Now we have app hang v2, which solely looks at delayed frames, which allows us to also detect app hangs that don't fully block the main thread: https://github.com/getsentry/sentry-cocoa/blob/ba8a46b7ac0863cf2c8e45b5386ffb4ad6c141d4/Sources/Sentry/include/SentryANRTrackerV2.h#L13-L22
@noahsmartin and I agreed that instead of setting the title, we should decode a large image to keep the main thread and the CFRunLoopObserver busy, which is a common mistake in iOS development.