sentry-java icon indicating copy to clipboard operation
sentry-java copied to clipboard

CI: performance impact measurements

Open vaind opened this issue 3 years ago • 12 comments

  • part of https://github.com/getsentry/team-mobile/issues/5
  • closes https://github.com/getsentry/sentry-java/issues/1965

Approach and scope of this PR

See my notes below on Macrobenchmark - after facing issues trying to get it to work on SauceLabs, I have pivoted towards Appium as it can be used to run both iOS and Android, and should also be usable for our other SDKs: Flutter, Unity, React-Native)

  • two simple apps (created with Android Studio -> New Project -> Basic Activity, without any more changes), the only difference being Sentry added in one of them.
  • build the apks for both apps (this can later be used for another check described in team-mobile#5
  • use Appium (through its Java SDK) on SauceLabs to launch both apps multiple times and measure the difference between them and also check sizes
  • add simple absolute-value tests (not comparing to values from the main branch yet)
  • print results in CI - the current output looks like this ("filtered" = outliers removed using the interquartile range):
[0-0] App io.sentry.java.tests.perf.appplain  launch times (original) | mean: 937.82 ms | stddev: 45.45 | values: [1038,891,901,946,918,907,911,922,945,1070,952,946,922,902,969,932,939,883,919,941,900,891,960,928,900,1015,1024,998,968,958,946,967,1010,921,965,904,1061,920,957,912,910,882,905,922,919,888,933,899,894,880]
[0-0] App io.sentry.java.tests.perf.appplain  launch times (filtered) | mean: 932.50 ms | stddev: 37.99 | values: [880,882,883,888,891,891,894,899,900,900,901,902,904,905,907,910,911,912,918,919,919,920,921,922,922,922,928,932,933,939,941,945,946,946,946,952,957,958,960,965,967,968,969,998,1010,1015,1024,1038]
[0-0] App io.sentry.java.tests.perf.appsentry launch times (original) | mean: 1004.14 ms | stddev: 24.21 | values: [1030,1017,948,1000,1034,966,1039,979,1008,995,1024,939,966,1010,1029,1018,1000,1027,993,993,975,1060,982,1030,1036,1016,1008,1023,1012,1034,987,1000,996,1012,1029,1032,1003,996,1013,984,985,1017,1001,976,992,1031,995,975,993,999]
[0-0] App io.sentry.java.tests.perf.appsentry launch times (filtered) | mean: 1005.47 ms | stddev: 22.57 | values: [948,966,966,975,975,976,979,982,984,985,987,992,993,993,993,995,995,996,996,999,1000,1000,1000,1001,1003,1008,1008,1010,1012,1012,1013,1016,1017,1017,1018,1023,1024,1027,1029,1029,1030,1030,1031,1032,1034,1034,1036,1039,1060]
[0-0] App io.sentry.java.tests.perf.appsentry takes approximately  72 ms more time to start than app io.sentry.java.tests.perf.appplain
[0-0] App io.sentry.java.tests.perf.appplain  size is 1.64MB
[0-0] App io.sentry.java.tests.perf.appsentry size is 3.56MB
[0-0] App io.sentry.java.tests.perf.appsentry is 1.92MB larger than app io.sentry.java.tests.perf.appplain

Next steps

  • post comment to PR
  • test apk size difference
  • collect the stats somewhere (maybe even Sentry transactions) so that later we can also fail if the time grows significantly
  • adapt the Appium project to support iOS (minor changes) and likely move to a separate repo/action

Notes on Macrobenchmark

I have first tried using jetpack macrobenchmark by deploying the app from https://github.com/getsentry/sentry-android-gradle-plugin/pull/317 to SauceLabs - the only option that seemed like it could work was using the espresso configuration for SauceLabs - it didn't though work though, the error was as follows, regardless of the config I tried (current state in https://github.com/vaind/sentry-android-gradle-plugin/commits/perf/benchmark-app-startup-time - if anyone wants to give it a try).

ERROR: Benchmark Target is Debuggable
Target package io.sentry.samples.instrumentation
is running with debuggable=true, which drastically reduces
runtime performance in order to support debugging features. Run
benchmarks with debuggable=false. Debuggable affects execution speed
in ways that mean benchmark improvements might not carry over to a
real user's experience (or even regress release performance).

While you can suppress these errors (turning them into warnings)
PLEASE NOTE THAT EACH SUPPRESSED ERROR COMPROMISES ACCURACY

// Sample suppression, in a benchmark module's build.gradle:
android {
defaultConfig {
testInstrumentationRunnerArguments["androidx.benchmark.suppressErrors"] = "DEBUGGABLE"
}
}
at androidx.benchmark.ConfigurationErrorKt.checkAndGetSuppressionState(ConfigurationError.kt:124)
at androidx.benchmark.macro.MacrobenchmarkKt.checkErrors(Macrobenchmark.kt:95)
at androidx.benchmark.macro.MacrobenchmarkKt.macrobenchmark(Macrobenchmark.kt:126)
at androidx.benchmark.macro.MacrobenchmarkKt.macrobenchmarkWithStartupMode(Macrobenchmark.kt:301)
at androidx.benchmark.macro.junit4.MacrobenchmarkRule.measureRepeated(MacrobenchmarkRule.kt:106)
at io.sentry.samples.instrumentation.benchmark.ExampleStartupBenchmark.startup(ExampleStartupBenchmark.kt:33)

#skip-changelog

vaind avatar Jul 13 '22 17:07 vaind

Codecov Report

Base: 80.62% // Head: 80.62% // No change to project coverage :thumbsup:

Coverage data is based on head (a43974f) compared to base (de74337). Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #2171   +/-   ##
=========================================
  Coverage     80.62%   80.62%           
  Complexity     3358     3358           
=========================================
  Files           240      240           
  Lines         12337    12337           
  Branches       1639     1639           
=========================================
  Hits           9947     9947           
  Misses         1783     1783           
  Partials        607      607           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Jul 14 '22 12:07 codecov-commenter

I think it would be nice to use the official Android lib recommended by Google for benchmarking the app start time. @romtsn, as you are one of our Gradle experts, could you maybe look at why the Jetpack Macrobenchmark lib used the debug config as pointed out by @vaind? If it doesn't work out, we can use a different approach as suggested by you @vaind, of course.

philipphofmann avatar Jul 18 '22 09:07 philipphofmann

I think it would be nice to use the official Android lib recommended by Google for benchmarking the app start time. @romtsn, as you are one of our Gradle experts, could you maybe look at why the Jetpack Macrobenchmark lib used the debug config as pointed out by @vaind? If it doesn't work out, we can use a different approach as suggested by you @vaind, of course.

FYI there was also some discussion on Discord: https://discord.com/channels/621778831602221064/621783562739384331/996842523446222909

vaind avatar Jul 18 '22 17:07 vaind

Any hints how to fix the CI build failure?

FAILURE: Build failed with an exception.
Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.
* What went wrong:
Could not determine the dependencies of task ':performance-tests:test-app-sentry:compileDebugJavaWithJavac'.
You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
See https://docs.gradle.org/7.4.2/userguide/command_line_interface.html#sec:command_line_warnings
6 actionable tasks: 6 up-to-date
> Could not resolve all task dependencies for configuration ':performance-tests:test-app-sentry:debugCompileClasspath'.
   > Could not resolve project :sentry-android.
     Required by:
         project :performance-tests:test-app-sentry
      > No matching variant of project :sentry-android was found. The consumer was configured to find an API of a component, preferably optimized for Android, as well as attribute 'com.android.build.api.attributes.BuildTypeAttr' with value 'debug', attribute 'com.android.build.api.attributes.AgpVersionAttr' with value '7.2.0' but:
          - Variant 'releaseApiElements' capability io.sentry:sentry-android:6.2.1 declares an API of a component, preferably optimized for Android, as well as attribute 'com.android.build.api.attributes.AgpVersionAttr' with value '7.2.0':
              - Incompatible because this component declares a component, as well as attribute 'com.android.build.api.attributes.BuildTypeAttr' with value 'release' and the consumer needed a component, as well as attribute 'com.android.build.api.attributes.BuildTypeAttr' with value 'debug'
          - Variant 'releaseRuntimeElements' capability io.sentry:sentry-android:6.2.1 declares a runtime of a component, preferably optimized for Android, as well as attribute 'com.android.build.api.attributes.AgpVersionAttr' with value '7.2.0':
              - Incompatible because this component declares a component, as well as attribute 'com.android.build.api.attributes.BuildTypeAttr' with value 'release' and the consumer needed a component, as well as attribute 'com.android.build.api.attributes.BuildTypeAttr' with value 'debug'
              ```

vaind avatar Jul 19 '22 11:07 vaind

Any hints how to fix the CI build failure?

FAILURE: Build failed with an exception.
Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.
* What went wrong:
Could not determine the dependencies of task ':performance-tests:test-app-sentry:compileDebugJavaWithJavac'.
You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
See https://docs.gradle.org/7.4.2/userguide/command_line_interface.html#sec:command_line_warnings
6 actionable tasks: 6 up-to-date
> Could not resolve all task dependencies for configuration ':performance-tests:test-app-sentry:debugCompileClasspath'.
   > Could not resolve project :sentry-android.
     Required by:
         project :performance-tests:test-app-sentry
      > No matching variant of project :sentry-android was found. The consumer was configured to find an API of a component, preferably optimized for Android, as well as attribute 'com.android.build.api.attributes.BuildTypeAttr' with value 'debug', attribute 'com.android.build.api.attributes.AgpVersionAttr' with value '7.2.0' but:
          - Variant 'releaseApiElements' capability io.sentry:sentry-android:6.2.1 declares an API of a component, preferably optimized for Android, as well as attribute 'com.android.build.api.attributes.AgpVersionAttr' with value '7.2.0':
              - Incompatible because this component declares a component, as well as attribute 'com.android.build.api.attributes.BuildTypeAttr' with value 'release' and the consumer needed a component, as well as attribute 'com.android.build.api.attributes.BuildTypeAttr' with value 'debug'
          - Variant 'releaseRuntimeElements' capability io.sentry:sentry-android:6.2.1 declares a runtime of a component, preferably optimized for Android, as well as attribute 'com.android.build.api.attributes.AgpVersionAttr' with value '7.2.0':
              - Incompatible because this component declares a component, as well as attribute 'com.android.build.api.attributes.BuildTypeAttr' with value 'release' and the consumer needed a component, as well as attribute 'com.android.build.api.attributes.BuildTypeAttr' with value 'debug'
              ```

It's likely because you also need to exclude them for bom and distribution, like we did with sentry-test-support: https://github.com/getsentry/sentry-java/blob/121d465b31546c7d98af7b4b260dc120db902f79/sentry-bom/build.gradle.kts#L6-L18

and https://github.com/getsentry/sentry-java/blob/121d465b31546c7d98af7b4b260dc120db902f79/build.gradle.kts#L101

romtsn avatar Jul 19 '22 21:07 romtsn

@vaind sorry man, I haven't got time to look at the PR, so I will leave this to @marandaneto as I'm on parental leave starting from tomorrow. Good luck!

romtsn avatar Jul 19 '22 21:07 romtsn

@vaind sorry man, I haven't got time to look at the PR, so I will leave this to @marandaneto as I'm on parental leave starting from tomorrow. Good luck!

thanks and enjoy the time with your kid/s

vaind avatar Jul 20 '22 07:07 vaind

I had a look at the logs of the saucelabs runs (https://app.saucelabs.com/tests/ecc1a80032b042fb9f223aa57dbcafe1). I looked for the logcat lines: ActivityTaskManager : Displayed io.sentry.java.tests.perf.appplain/.MainActivity: +353ms or ActivityTaskManager : Displayed io.sentry.java.tests.perf.appsentry/.MainActivity: +417ms Usually these values range ~300ms to ~450ms for each run. The results of the duration of the startActivity appium commands are ~900ms to ~1100ms. Are we sure we consider the startup times correctly? Or is appium adding some overhead on receiving the "command finished" signal? I leave a couple links as reference, but the time shown in logcat should be the correct one, afaik (happy to be wrong, though 😄) https://dev.to/pyricau/android-vitals-how-adb-measures-app-startup-5n7 https://medium.com/androiddevelopers/testing-app-startup-performance-36169c27ee55

From what i can see we have the following options (please add yours, too 😄):

  1. Get a better result from the current setup, maybe there is some way to understand it (even get the logcat value, but we'd lose some advantage of using appium, since it would work only for Android)
  2. Use Macrobenchmark if we can
  3. Use adb commands (a sample app that using adb installs the other apps, launches them, get the value from logcat, kill them, and repeat)

stefanosiano avatar Jul 27 '22 11:07 stefanosiano

Thanks @stefanosiano, I'll check the startup values.

  1. We should be able to access values from logcat from Appium and I think that would be fine even from the cross-platform perspective, as we can just have a smaller piece of code that collects the times that is platform specific and still reuse the remainder.

  2. Macrobenchmark - I'll give the setting you mentioned above a quick go locally on my setup of the sample by Roman in the gradle plugin project, just to verify whether that works or not. Though I'm still not keen to switch unless there's a real benefit we can get out of it.

  3. running adb commands directly - I've considered this before and it's not possible on Saucelabs AFAICT

vaind avatar Jul 27 '22 11:07 vaind

Current status:

  • moved Kotlin as per a common request,
  • added actions to create/update a PR comment with the measured info (current status, no history)
  • changed data collection to parse ActivityManager: Displayed ... +XYZms from logcat instead of Appium command execution time.

Open issues:

  • [x] disable the new test in the main gradlw build and test runs by default - how can we do this? I guess we could remove the project out of the root gradle, which we may end up doing anyway if the appium project moves to some reusable repo/workflow

vaind avatar Aug 04 '22 15:08 vaind

@vaind have you tried adding categories (https://www.baeldung.com/junit-5-gradle#configuring-junit-5-tests-with-gradle) to the tests, so we can exclude them by default and only include them if we want. Not sure if that's available with kotlin test though.

Otherwise we could have a different source set (https://www.baeldung.com/gradle-source-sets) that runs on a separate gradle task which is not part of the build and test task.

Does that help?

adinauer avatar Aug 05 '22 10:08 adinauer

@adinauer I've moved the test code to a standalone repo (GH action) so it's not an issue anymore. thanks for the suggestions anyway.

vaind avatar Aug 12 '22 08:08 vaind

Performance metrics :rocket:

Plain With Sentry Diff
Startup time (ms) 370.61 399.90 29.29
Size (bytes) 1825823 2445266 619443

github-actions[bot] avatar Sep 08 '22 21:09 github-actions[bot]

@vaind I left a comment, is there anything pending or ready to merge?

marandaneto avatar Sep 09 '22 08:09 marandaneto

@vaind I left a comment, is there anything pending or ready to merge?

Good to go from my POV

vaind avatar Sep 17 '22 08:09 vaind