sentry-java icon indicating copy to clipboard operation
sentry-java copied to clipboard

ANR in Sentry.initConfigurations due to disk I/O

Open vibin opened this issue 2 years ago • 11 comments

Integration

sentry-android

Build System

Gradle

AGP Version

7.0.4

Proguard

Enabled

Version

5.7.4

Steps to Reproduce

Initialize SentryAndroid

Expected Result

Expecting to initialize without any ANR

Actual Result

ANR thrown 6660 times for 6455 users in last 7 days:

main (native): tid=1 systid=10537 
#00 pc 0x86b8c libc.so 
#01 pc 0x1b092c libart.so 
#02 pc 0x6743c0 libart.so 
#03 pc 0x6741b0 libart.so 
       at libcore.io.Linux.access(Linux.java)
       at libcore.io.ForwardingOs.access(ForwardingOs.java:72)
       at libcore.io.BlockGuardOs.access(BlockGuardOs.java:73)
       at libcore.io.ForwardingOs.access(ForwardingOs.java:72)
       at android.app.ActivityThread$AndroidOs.access(ActivityThread.java:8485)
       at java.io.UnixFileSystem.checkAccess(UnixFileSystem.java:281)
       at java.io.File.exists(File.java:815)
       at java.io.File.mkdirs(File.java:1349)
       at io.sentry.Sentry.initConfigurations(Sentry.java:222)
       at io.sentry.Sentry.init(Sentry.java:169)
       at io.sentry.Sentry.init(Sentry.java:117)
       at io.sentry.android.core.SentryAndroid.init(SentryAndroid.java:83)
       at io.sentry.android.core.SentryAndroid.init(SentryAndroid.java:64)
       at redacted.core.CrashReporter$Companion.initSentry(CrashReporter.kt:96)
       at redacted.core.CrashReporter$Companion.init(CrashReporter.kt:90)
       at redacted.supply.main.initializer.CrashReporterInitializer.initialize(CrashReporterInitializer.kt:12)
       at redacted.supply.main.initializer.AppInitializers.init(AppInitializers.kt:43)
       at redacted.supply.main.RedactedApplication.onCreate(RedactedApplication.java:98)
       at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:1192)
       at android.app.ActivityThread.handleBindApplication(ActivityThread.java:7573)
       at android.app.ActivityThread.access$1400(ActivityThread.java:301)
       at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2166)
       at android.os.Handler.dispatchMessage(Handler.java:106)
       at android.os.Looper.loop(Looper.java:246)
       at android.app.ActivityThread.main(ActivityThread.java:8633)
       at java.lang.reflect.Method.invoke(Method.java)
       at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:602)
       at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1130)

vibin avatar Aug 23 '22 10:08 vibin

Thank you for reporting this @vibin. At the moment there's no easy way to avoid this. We could try and move calls that may cause ANR during init to the background. I can give no timeline for it at this time though.

adinauer avatar Aug 29 '22 07:08 adinauer

@adinauer Got it, we will try initing on a background thread

vibin avatar Aug 29 '22 07:08 vibin

Some context on why we do I/O: The SDK by default uses the device file system to store events so we can support offline scenarios. Additionally, we write a session file to disk so in the event of a crash, on restart we can pick that file up and update the session status to crashed. This affects crash free users and crash free session rate.

bruno-garcia avatar Aug 29 '22 13:08 bruno-garcia

@vibin are you seeing anything the affected 6k users have in common? Like are they maybe all on low end devices?

adinauer avatar Aug 29 '22 13:08 adinauer

@adinauer All of them are background process creations, but that could just be because Android never throws ANRs for foreground application start

Yes, the devices are low-end: Galaxy M21 (7%), Galaxy M31 (6%), Galaxy A51 (6%), JioPhone Next (6%)

vibin avatar Aug 29 '22 14:08 vibin

Maybe having an option to turn off features requiring IO that can then be turned off only for devices using an old version of Android would solve this. It would however come with the trade off of losing features described by @bruno-garcia further up.

Some context on why we do I/O: The SDK by default uses the device file system to store events so we can support offline scenarios. Additionally, we write a session file to disk so in the event of a crash, on restart we can pick that file up and update the session status to crashed. This affects crash free users and crash free session rate.

adinauer avatar Aug 29 '22 14:08 adinauer

@vibin could you please share you SentryAndroid.init code snippet and your manifest file so we can get a feel for what features you're using.

Also are you using any other crash reporting tools?

adinauer avatar Sep 14 '22 14:09 adinauer

@adinauer sure, we use custom init with the below block:

SentryAndroid.init(context) { options: SentryAndroidOptions ->
  options.apply {
      release = BuildConfig.VERSION_NAME
      dsn = BuildConfig.SENTRY_DSN
      isEnableNdk = false
      sampleRate = null // Send all errors. No sampling.
      tracesSampleRate = null // Disable tracing.
      tracesSampler = null
      isAnrEnabled = false
      isEnableAutoSessionTracking = true
      sessionTrackingIntervalMillis = redacted
      isSendDefaultPii = false
      addInAppInclude("redacted.android")
      addInAppInclude("redacted.analytics")
      isEnableAppComponentBreadcrumbs = true
      isEnableSystemEventBreadcrumbs = true
      isEnableAppLifecycleBreadcrumbs = true
      isEnableActivityLifecycleBreadcrumbs = false
      addIntegration(SentryActivityFragmentLifecycleIntegration(context)) // Custom integration.
      isEnableAutoActivityLifecycleTracing = false
      isEnableActivityLifecycleTracingAutoFinish = false
      isEnableUserInteractionBreadcrumbs = false
      beforeSend = BeforeSendCallback { event: SentryEvent, hint: Any? ->
          return@BeforeSendCallback if (event.isCrashed) {
              event
          } else {
              null // Swallow non-fatal errors.
          }
      }
  }
}

Two customisations in AndroidManifest.xml:

<provider
    android:name="io.sentry.android.core.SentryInitProvider"
    android:authorities="${applicationId}.SentryInitProvider"
    tools:node="remove"/>
<provider
    android:name="io.sentry.android.core.SentryPerformanceProvider"
    android:authorities="${applicationId}.SentryPerformanceProvider"
    tools:node="remove"/>

We do use Firebase Crashlytics as our own main crash reporter.

vibin avatar Sep 14 '22 14:09 vibin

Thanks @vibin we'll investigate and report once we know more.

adinauer avatar Sep 21 '22 13:09 adinauer

@vibin which tool reported the ANR? Is there more context available?

adinauer avatar Sep 21 '22 13:09 adinauer

@adinauer it's reported by Crashlytics (which is initialised as a ContentProvider, much before Sentry is initialised).

Attaching few insights surfaced by Crashlytics:

image

Documentation regarding these tags: https://firebase.google.com/docs/crashlytics/debug-anr-errors?authuser=1&hl=en#io-root-blocking-tag

vibin avatar Sep 21 '22 14:09 vibin

We could try to run the App from Storage Card and see if that might be the cause.

adinauer avatar Sep 28 '22 14:09 adinauer

Would it be possible to have only the uncaught exception handler/event queue to be initialized on the main thread and then do all sentry initialization on a background thread?

remcomokveld avatar Jun 14 '23 15:06 remcomokveld

Would it be possible to have only the uncaught exception handler/event queue to be initialized on the main thread and then do all sentry initialization on a background thread?

certainly, we have this in our internal ideas list, I think we're gonna look into this quite soon!

romtsn avatar Jun 14 '23 18:06 romtsn

I took a deeper look at app startup and Sentry.init to see potential performance bottlenecks. A few observations:

A typical Sentry.init call took 118ms for a release build and performs some device IO on the main thread:

  • <1ms On startup we ensure that some folders exist (A few mkdirs calls here), definitely not ideal, but seems to be very fast when device has no IO pressure
  • ??ms We delete old profiles on startup, calling .listFiles() on the main thread, this can be easily moved to the background task (link)
  • 9.5ms The EnvelopeFileObserver is being started on the main thread
  • 1.7ms We check for the startup crash marker file here, I think this needs to be done on the main thread as otherwise startup crashes never will get sent

Apart from that, we have a few low hanging fruits which could improve app start significantly:

  • 15ms: NdkScopeObserver.addBreadcrumb
  • 11ms: SystemEventsBroadcastIntegration.register
  • 9ms: TempSensorBreadcrumbsIntegration.register
  • 7ms: Dsn.init is done twice: in Sentry.initConfigurations and RequestDetailsResolver.resolve
  • 3ms: AtomicClientReportStorage is init twice: in SentryOptions.init and ManifestMetaDataReader
  • 4ms: ManifestMetaDataReader, although manual init is performed in my test case

markushi avatar Aug 09 '23 09:08 markushi

Let's try with app living on the SD card

markushi avatar Aug 09 '23 14:08 markushi

??ms We delete old profiles on startup, calling .listFiles() on the main thread, this can be easily moved to the background task (link)

For this we could capture the timestamp and then move .listFiles() onto the background thread and delete only those files that are older than the timestamp to prevent from deleting newly added profiles.

romtsn avatar Aug 16 '23 14:08 romtsn

Several improvements were made for this, starting from SDK version 7.0.0. I'm closing this issue, now. Feel free to add a comment in case the issue is not resolved

stefanosiano avatar Dec 21 '23 11:12 stefanosiano