sentry-dart icon indicating copy to clipboard operation
sentry-dart copied to clipboard

Handle backpressure earlier in pipeline

Open denrase opened this issue 1 year ago • 5 comments

:scroll: Description

Move the task queue where we limit the number of consecutive un-awaited SDK tasks to the earliest possible point in the SDK. That way we don't run event processors on events/transactions that will get dropped eventually.

:bulb: Motivation and Context

Relates to #2360 Relates to #2368

:green_heart: How did you test it?

CI

:pencil: Checklist

  • [x] I reviewed submitted code
  • [ ] I added tests to verify changes
  • [ ] No new PII added or SDK only sends newly added PII if sendDefaultPii is enabled
  • [ ] I updated the docs if needed
  • [x] All tests passing
  • [x] No breaking changes

:crystal_ball: Next steps

@buenaflor I think we should also keep #2368, as it is debouncing based on time, and the task queue is about how many parallel tasks can be scheduled with sentry, independent from their duration. WDYT?

denrase avatar Oct 23 '24 09:10 denrase

Codecov Report

Attention: Patch coverage is 88.88889% with 3 lines in your changes missing coverage. Please review.

Project coverage is 84.72%. Comparing base (7954fb3) to head (2e3680c). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
dart/lib/src/transport/task_queue.dart 71.42% 2 Missing :warning:
dart/lib/src/sentry.dart 94.73% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2371      +/-   ##
==========================================
- Coverage   84.72%   84.72%   -0.01%     
==========================================
  Files         253      253              
  Lines        9083     9097      +14     
==========================================
+ Hits         7696     7707      +11     
- Misses       1387     1390       +3     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 23 '24 09:10 codecov[bot]

Android Performance metrics :rocket:

  Plain With Sentry Diff
Startup time 653.86 ms 707.64 ms 53.78 ms
Size 6.49 MiB 7.57 MiB 1.08 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
8a111a931e3ce73766e66f9be34ceb22c3a4e2cc 496.27 ms 520.57 ms 24.31 ms
5f2f77becb74ef2093007f845ec89eed6c5a031d 429.06 ms 507.74 ms 78.68 ms
7ec9238637dbe38b6170da172bc0650eaa157885 414.02 ms 513.94 ms 99.91 ms
abfcdb5365060597e8a7479eafb93c0cfb6a210b 416.55 ms 498.88 ms 82.33 ms
26e955b4fd95d08090ceb4efc74631e20000a553 369.52 ms 458.60 ms 89.07 ms
7f75f32d9707c5a5abdb869a9eb5de05dd91bed8 347.36 ms 419.58 ms 72.22 ms
7d5e695652556d2c9b46f3e78316841a831c77b5 433.45 ms 485.11 ms 51.66 ms
cdf717208af3df3b373ff78e404cc7da02e49a29 348.54 ms 390.81 ms 42.27 ms
5aba4171c641227176fb08f4a73fc95f620c4a42 355.78 ms 450.39 ms 94.61 ms
3334ac176f15a6d39b435960620ea9a4919aae79 303.98 ms 366.65 ms 62.67 ms

App size

Revision Plain With Sentry Diff
8a111a931e3ce73766e66f9be34ceb22c3a4e2cc 6.49 MiB 7.56 MiB 1.07 MiB
5f2f77becb74ef2093007f845ec89eed6c5a031d 6.35 MiB 7.40 MiB 1.05 MiB
7ec9238637dbe38b6170da172bc0650eaa157885 6.35 MiB 7.42 MiB 1.06 MiB
abfcdb5365060597e8a7479eafb93c0cfb6a210b 6.35 MiB 7.40 MiB 1.05 MiB
26e955b4fd95d08090ceb4efc74631e20000a553 6.27 MiB 7.20 MiB 956.49 KiB
7f75f32d9707c5a5abdb869a9eb5de05dd91bed8 6.26 MiB 7.20 MiB 959.18 KiB
7d5e695652556d2c9b46f3e78316841a831c77b5 6.49 MiB 7.55 MiB 1.07 MiB
cdf717208af3df3b373ff78e404cc7da02e49a29 5.94 MiB 6.95 MiB 1.01 MiB
5aba4171c641227176fb08f4a73fc95f620c4a42 5.94 MiB 6.96 MiB 1.02 MiB
3334ac176f15a6d39b435960620ea9a4919aae79 6.06 MiB 7.03 MiB 993.54 KiB

Previous results on branch: enha/earlier-task-queue

Startup times

Revision Plain With Sentry Diff
42d6616ff8b41becd46f0f356e052589408eef1f 454.80 ms 486.77 ms 31.97 ms
05bc7c529ca78fcedfb4b2047505020d6990904f 440.45 ms 477.56 ms 37.11 ms

App size

Revision Plain With Sentry Diff
42d6616ff8b41becd46f0f356e052589408eef1f 6.49 MiB 7.57 MiB 1.08 MiB
05bc7c529ca78fcedfb4b2047505020d6990904f 6.49 MiB 7.57 MiB 1.08 MiB

github-actions[bot] avatar Oct 23 '24 09:10 github-actions[bot]

iOS Performance metrics :rocket:

  Plain With Sentry Diff
Startup time 1267.69 ms 1287.02 ms 19.33 ms
Size 8.38 MiB 9.75 MiB 1.37 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
613760b51612c5874f586dd29349e387b289c8cd 1263.10 ms 1277.27 ms 14.16 ms
2966d88d828c6e2604b15b95f0d0d958acc2cc68 1251.76 ms 1270.21 ms 18.46 ms
f172c4de11085065216c98124283effeb0d0f15e 1350.66 ms 1408.49 ms 57.83 ms
0a23f9840336da9128fd8f15f1a373563820df64 1252.98 ms 1276.76 ms 23.78 ms
6daa837b45789cb21c8bb0075c58ad6f5e5e83b6 1250.42 ms 1265.60 ms 15.18 ms
f79eecfc96c81e45b312c72aea32e4943329666b 1210.25 ms 1221.65 ms 11.40 ms
7954fb356f96f17127c41230af433bc9ce01233c 1247.20 ms 1272.15 ms 24.94 ms
5112c69cbabc27fdee43278e7fae250aa56d0120 1272.76 ms 1293.37 ms 20.61 ms
bf4aed726e9adcfe8b364aef9d0870dd2c4ad5fe 1274.63 ms 1286.48 ms 11.85 ms
24f71aae695d5763d7ad28592191d95841f59b40 1267.47 ms 1272.00 ms 4.53 ms

App size

Revision Plain With Sentry Diff
613760b51612c5874f586dd29349e387b289c8cd 8.15 MiB 9.13 MiB 1000.46 KiB
2966d88d828c6e2604b15b95f0d0d958acc2cc68 8.32 MiB 9.38 MiB 1.06 MiB
f172c4de11085065216c98124283effeb0d0f15e 8.33 MiB 9.62 MiB 1.29 MiB
0a23f9840336da9128fd8f15f1a373563820df64 8.10 MiB 9.18 MiB 1.08 MiB
6daa837b45789cb21c8bb0075c58ad6f5e5e83b6 8.33 MiB 9.40 MiB 1.07 MiB
f79eecfc96c81e45b312c72aea32e4943329666b 8.29 MiB 9.36 MiB 1.07 MiB
7954fb356f96f17127c41230af433bc9ce01233c 8.38 MiB 9.75 MiB 1.37 MiB
5112c69cbabc27fdee43278e7fae250aa56d0120 8.16 MiB 9.17 MiB 1.01 MiB
bf4aed726e9adcfe8b364aef9d0870dd2c4ad5fe 8.10 MiB 9.17 MiB 1.08 MiB
24f71aae695d5763d7ad28592191d95841f59b40 8.10 MiB 9.16 MiB 1.07 MiB

Previous results on branch: enha/earlier-task-queue

Startup times

Revision Plain With Sentry Diff
42d6616ff8b41becd46f0f356e052589408eef1f 1247.06 ms 1255.36 ms 8.30 ms
05bc7c529ca78fcedfb4b2047505020d6990904f 1241.57 ms 1271.11 ms 29.54 ms

App size

Revision Plain With Sentry Diff
42d6616ff8b41becd46f0f356e052589408eef1f 8.38 MiB 9.75 MiB 1.37 MiB
05bc7c529ca78fcedfb4b2047505020d6990904f 8.38 MiB 9.75 MiB 1.37 MiB

github-actions[bot] avatar Oct 23 '24 10:10 github-actions[bot]

I think we should also keep https://github.com/getsentry/sentry-dart/pull/2368, as it is debouncing based on time, and the task queue is about how many parallel tasks can be scheduled with sentry, independent from their duration. WDYT?

I just wanna make sure that the screenshot is still attached to the events that do go through the task queue, other than that I'm not against leaving the debouncing

I think it's also not a good experience if the screenshot is 'inconsistent', some errors have it, some don't

buenaflor avatar Oct 23 '24 10:10 buenaflor

I think it's also not a good experience if the screenshot is 'inconsistent', some errors have it, some don't

Fair point, this would indeed be an issue and potentially confusing. Also users can change the SDK behaviour by setting a different maxQueue value. Lets close the other PR then. 👍

denrase avatar Oct 23 '24 11:10 denrase

probably also makes sense to use the memory profiler of the flutter devtool to check what's allocating so much in a tight loop

buenaflor avatar Oct 28 '24 10:10 buenaflor

@buenaflor The app in the simulator hangs if session replay is enabled. Probably because there are too many ViewHierarchyElements being created/destroyed in a short amount of time (~64k objects). Without, it works. Memory usage increases while the loop is running.

Bildschirmfoto 2024-10-29 um 10 59 12 Bildschirmfoto 2024-10-29 um 10 57 41 Bildschirmfoto 2024-10-29 um 11 28 06

denrase avatar Oct 29 '24 12:10 denrase

@denrase thx for investigating so if only screenshot is enabled the memory usage spikes in a 'tight' loop with await but would not be enough for watchdogs to terminate the app?

the user reported this is happening more often on older devices

buenaflor avatar Oct 29 '24 13:10 buenaflor

@buenaflor Still need to test this on an actual iOS device. In the meantime, can the user in question use something like exponential backoff until we come up with a solution? Think this PR here is good in itself, no need to do processing for dropped tasks, but it will not solve the issue of too many awaited tasks in a loop. Here we are still attaching screenshots.

Come to think of it, what if we cache the screenshot for some time? That way every event would still get a screenshot attached, but we could probably also relieve a bit of CPU/Memory pressure.

denrase avatar Oct 29 '24 14:10 denrase

@denrase I don't think they can do an exponential back-off, they have a specific use case where they need to retry aggresively because their users are typically in areas where internet connection is quite bad (think of in an airplane or in the mountains etc...)

Caching the screenshot for a period of time sounds interesting, might be worth exploring as an opt-in option

buenaflor avatar Oct 29 '24 14:10 buenaflor

it's for sure a special use case, normally we'd discourage this anyway but it's a good opportunity to work on some memory/cpu expensive operations in the SDK and get some insight of the overhead of certain features

buenaflor avatar Oct 29 '24 14:10 buenaflor

that being said, let's explore if we could cache the screenshot reliably in some way

can you create a new issue for that pls

buenaflor avatar Oct 29 '24 15:10 buenaflor

Ok, working on iPhone 16 Pro. Short spike in memory usage (Xcode), but goes through without issues. Can test later on an iPhone X.

Bildschirmfoto 2024-10-29 um 16 39 55

denrase avatar Oct 29 '24 15:10 denrase