sentry-dart icon indicating copy to clipboard operation
sentry-dart copied to clipboard

Support Profiling

Open marandaneto opened this issue 2 years ago • 34 comments

Description

Similar to https://docs.sentry.io/platforms/android/profiling/ but for Dart code

Relates issues on the Dart SDK https://github.com/dart-lang/sdk/issues/3686, https://github.com/dart-lang/sdk/issues/37664, https://github.com/dart-lang/sdk/issues/50055, https://github.com/flutter/flutter/issues/37204

marandaneto avatar Nov 05 '22 19:11 marandaneto

Right now that would be possible if you init the SDK manually.

Enable Performance and Profiling directly on the Native SDKs, for example, Android. docs.sentry.io/platforms/android/performance docs.sentry.io/platforms/android/profiling

The same steps for iOS, would work for Android and iOS native code only, not in the Dart bits nor C/C++ code.

marandaneto avatar Jan 16 '23 11:01 marandaneto

Hey @marandaneto , I was talking to @vaind and he might take a look at this to see how hard/what options do we have.

bruno-garcia avatar May 29 '23 18:05 bruno-garcia

Hey @bruno-garcia @vaind are there any updates on this?

kahest avatar Jul 07 '23 12:07 kahest

@kahest no updates yet, just started looking into this recently

vaind avatar Jul 07 '23 12:07 vaind

This article by a Dart SDK developer gives some intro how profiling is implemented in the Dart VM and exposed via DevTools. TLDR:

  • there's native code in the dart VM responsible for sampling
  • this code is exposed via a VM service (websocket)
  • this code is not compiled in a release build (a.k.a "product"), see for example #ifndef PRODUCT in profiler.cc\
  • even if it was available in a release build, the VM service opens a port which would be accessible from any other app on the same device, so it would likely not be feasible to use this in production apps anyway.

Therefore, this looks like a dead end.

I'll update here if I can find an alternative solution, e.g. isolate stacks sampling from dart directly.

vaind avatar Jul 08 '23 10:07 vaind

@vaind what if we propose to make this available on release builds under a build opt-in flag, the port is closed in this case. Could we reuse most of the profiler implementation if we get the buy-in from the Dart team in this case? raising an issue and so on.

marandaneto avatar Jul 08 '23 10:07 marandaneto

@marandaneto I've considered that too but I'm not sure it's feasible because of the VM service port being exposed to every app on the device. It doesn't really matter whether it's an opt-in at build time, you wouldn't want to distribute such an app, especially on mobile devices.

vaind avatar Jul 08 '23 11:07 vaind

@vaind that was my point, can we change this approach about the port? finding another way to consume the service without opening the port, or via https://github.com/dart-lang/sdk/issues/37664

marandaneto avatar Jul 08 '23 11:07 marandaneto

To send profiles, the SDK will need to update the way it sends/enriches events on Android.

  • https://github.com/getsentry/team-mobile/issues/11

The Outbox sender opens the saved envelope and sends envelope items as individual envelopes, but profiles have to be ingested with Tx in one envelope.

krystofwoldrich avatar Jul 10 '23 07:07 krystofwoldrich

So apparently, native profilers should work with AOT compiled dart. Going to see if I can make it work with our existing native SDK profilers. See this thread on Discord

mraleph Anything that works for native code will work just the same for Dart, so if you have some sampling profiler for C++ / Objective-C / Swift then you can just use that. AOT compiled binaries are just normal native binaries (at least on Linux, Android and Mac OS X / iOS - Windows is an exception) which just need some runtime support to run. our calling conventions are fairly traditional (frames are linked through framepointer) and we generate eh_frame / debug_frame so non-FP based unwinders should also be able to unwind the stack. This means native tools like perf and Instruments work just fine with AOT compiled Dart code and I also know that https://gperftools.github.io/gperftools/cpuprofile.html (which is a very simple profiler which simply unwinds stack using frame-pointer chaining) works as well. (There is one minor catch which trips over simpleperf on Android ARM64 - but I don't think it matters much if you just write a manual unwinder which follows FP chain)

vaind avatar Jul 11 '23 08:07 vaind

@vaind the Android profiler right now only profiled Java/Kotlin code, No native (C/C++) code, maybe the iOS one works though.

marandaneto avatar Jul 11 '23 12:07 marandaneto

Update: sentry-cocoa profiler seems to work, somewhat. In a flutter app on macOS, I've started a transaction in swift, than ran a heavy operation in dart and stopped the swift transaction afterwards. The profile is captured and after symbolication, it shows function names, albeit the line numbers are not available consistently... See sample profile.

On the other hand, the CPU profiler is going to show work that is actually being executed, so in case of async-await, it may get more complicated to see what is actually going on. I'll have to devise a better testing app to evaluate that.

vaind avatar Jul 20 '23 03:07 vaind

After some testing, It seems like the native profiler could be the way to go, at least for iOS and macOS. However, I'm having issues with the symbolication - all the flutter symbols (e.g. referencing /private/var/containers/Bundle/Application/2D6824B6-DC93-4F60-AFC2-ADE201585EC6/Runner.app/Frameworks/Flutter.framework/Flutter) in this profile are unresolved (symbol not found). I don't know - was there some custom handling in the symbolicator for flutter-specific symbols? Maybe that doesn't get triggered when the transaction comes from swift... Also I've tested triggering an error in swift and the image seems resolved but the frames say redacted. Do you know what that is about @marandaneto ?

vaind avatar Jul 20 '23 19:07 vaind

@vaind not aware of any changes/bugs. There were changes for Flutter specifically, mostly around source maps IIRC. I recall that as well https://github.com/getsentry/symbolic/blob/11472bfbb31f2ed76802ff50bfc40a2b0852ee1b/symbolic-debuginfo/src/dwarf.rs#L519-L521 but not sure if there's any impact. Do the redacted frames are inApp or maybe some system apps/3rd party libs?

marandaneto avatar Jul 21 '23 06:07 marandaneto

OK, so this would definitely need more attention to get working properly. I'm not sure trying to investigate this deeply makes sense just yet, with other platforms not resolved yet. I'm thinking we should first make sure all other desired platforms can be supported, before the detailing work on iOS. WDYT @marandaneto ?

Also, I understand the goal would be to support all platforms supported by Flutter. However, if we go the route of native profiling, that means the platforms would be evaluated & implemented one by one. Would that be acceptable? If so, what are the priorities for platform support and is there a hard stop if some specific platform cannot be supported?

vaind avatar Jul 21 '23 07:07 vaind

@vaind makes sense, I'd focus on iOS and Android first, most likely starting from iOS since the iOS profiler should work (as you stated with a few gotchas). Next is Android although we'd need a different solution, probably something that should be builtin in https://github.com/getsentry/sentry-native? Maybe @stefanosiano and/or @indragiek can chime in here maybe they know or have investigated C/C++ profilers for Android, instead of the current Java/Kotlin-only approach.

I know this: https://developer.android.com/topic/performance/tracing/custom-events-native

Wondering if the Android native profiler would work for Windows and Linux but that's definitely a stretch since we don't have the sentry-native SDK yet built-in in Sentry Flutter anyway.

marandaneto avatar Jul 21 '23 08:07 marandaneto

Good, my idea was to verify the feasibility of native profiling on Flutter with the android SDK (as you have mentioned, via sentry-native most likely). PoC would be enough IMO and then we can go on and finish iOS first before fully implementing Android.

vaind avatar Jul 21 '23 09:07 vaind

Some notes on Android profiling:

  • simpleperf apparently has issues profiling Dart
  • gperftools seems usable although the build system is a bit outdated and it's not clear whether Android is actually supported
  • pprof-rs also looks like an option but AFAIK the internal testing in the rust SDK has shown inconsistent sampling ratios

vaind avatar Jul 25 '23 10:07 vaind

@vaind your best bet to find out which native profilers work well on Android - Native code/NDK (at runtime/low frequency/release mode) will be asking on the Android united slack community, there's a #ndk channel and some Googlers are there, including @DanAlbert which is one of the lead contributors on https://github.com/android/ndk

If we can't use simpleperf directly, they might know some other options.

marandaneto avatar Jul 25 '23 11:07 marandaneto

https://android.googlesource.com/platform/system/extras/+/refs/heads/main/simpleperf/doc/android_application_profiling.md simpleperf is only profileable in debug and profile mode apparently, there's a work around but apparently still depends on adb.

If you want to profile a release build of an application: For the release build type, Android studio sets android::debuggable=“false” in AndroidManifest.xml, disables JNI checks and optimizes C/C++ code. However, security restrictions mean that only apps with android::debuggable set to true can be profiled. So simpleperf can only profile a release build under these three circumstances: If you are on a rooted device, you can profile any app.

If you are on Android >= Q, you can add profileableFromShell flag in AndroidManifest.xml, this makes a released app profileable by preinstalled profiling tools. In this case, simpleperf downloaded by adb will invoke simpleperf preinstalled in system image to profile the app.

@vaind did you check the firefox profiler? https://profiler.firefox.com/docs/#/./guide-profiling-android-directly-on-device

Edit: apparently simpleperf as well https://searchfox.org/mozilla-central/source/third_party/libwebrtc/tools_webrtc/android/profiling/perf_setup.sh

marandaneto avatar Jul 25 '23 11:07 marandaneto

I know nothing about Dart.

DanAlbert avatar Jul 25 '23 17:07 DanAlbert

I know nothing about Dart.

Flutter apps written in Dart compiles to Native code so it's not really about Dart profilers but rather Android Profilers that are able to profile Native code and not only Java/Kotlin.

marandaneto avatar Jul 25 '23 18:07 marandaneto

After some testing, It seems like the native profiler could be the way to go, at least for iOS and macOS. However, I'm having issues with the symbolication - all the flutter symbols (e.g. referencing /private/var/containers/Bundle/Application/2D6824B6-DC93-4F60-AFC2-ADE201585EC6/Runner.app/Frameworks/Flutter.framework/Flutter) in this profile are unresolved (symbol not found). I don't know - was there some custom handling in the symbolicator for flutter-specific symbols? Maybe that doesn't get triggered when the transaction comes from swift... Also I've tested triggering an error in swift and the image seems resolved but the frames say redacted. Do you know what that is about @marandaneto ?

OK so at least in errors, the issue of some stack frames not being symbolicated is due to dSYMs missing for the Flutter.framework (or FlutterMacOS.framework). They're currently not shipped with Flutter at the moment so the dart plugin won't upload them to Sentry and thus they can't be used for symbolication, see https://github.com/flutter/flutter/issues/117404#issuecomment-1360064880

vaind avatar Jul 25 '23 20:07 vaind

@vaind we can probably make a flutter symbol server https://docs.sentry.io/platforms/unreal/data-management/debug-files/symbol-servers/ Another Option is that the dart plugin figure out the correct flutter version/download link and download/upload them.

marandaneto avatar Jul 26 '23 06:07 marandaneto

and the third one, IMO safer for long term maintenance, would be to update the flutter tool to include the dSYM together with the rest of the build output. The same applies to iOS, macOS and likely Android symbols.

FYI, after downloading the dSYM manually and uploading it to sentry.io as a DIF, the issue stack trace now looks much better:

vaind avatar Jul 26 '23 07:07 vaind

@vaind totally agree but the issue is ~2y old already, not sure if this will ever be addressed. We can be more proactive and find a solution that won't demand too much work. Symbol servers work with GCP so maybe it's an easy win.

What we can do for now is also amend the docs and let people know that they can do this manually (via sentry-cli), so at least is documented as a limitation of our automatic approach (and linking to the original GH issue).

marandaneto avatar Jul 26 '23 07:07 marandaneto

I've filed a feature request for a new built-in Flutter symbol server, let's see if this is possible, and is less work/less to maintain than the other options.

marandaneto avatar Jul 26 '23 07:07 marandaneto

The additional issue with symbolication on iOS I'm having trouble with is:

While symbolication works reasonable well, one function I've devised to actually produce a lot of load is getting multiple frames, with no line numbers present. I just can't seem to figure out what is the issue and why other frames in the stack trace do have the line number, even the caller Dart function which is the second in the stack... Maybe @Swatinem could help out here?

I've uploaded the whole build folder with the debug symbols, the envelope with the captured profile and the profile as downloaded (symbolicated) from Sentry.

vaind avatar Jul 27 '23 11:07 vaind

@vaind I will be OOO until the 7th but feel free to ping @Swatinem on Discord @kahest or @krystofwoldrich can be the bridge as well if needed.

marandaneto avatar Jul 27 '23 12:07 marandaneto

While symbolication works reasonable well, one function I've devised to actually produce a lot of load is getting multiple frames, with no line numbers present.

It might be that the information is simply missing from the DWARF we generate. We emit just enough information to make meaningful stack traces, which means we don't emit any useful DWARF for the places which are not calls. So if you write something like this:

void foo() {
  for (var i = 0; i < N; i++) {
     // Do some math without any calls.
  }
}

Then the best you can be get is that the time is spent in foo function - but you would not be able to tell where exactly in that function the time is spent.

I suggest just looking at raw PCs that profiler has collected and then looking at the corresponding generated machine code & DWARF to see if this is indeed the case.

mraleph avatar Jul 31 '23 07:07 mraleph