dd-sdk-flutter icon indicating copy to clipboard operation
dd-sdk-flutter copied to clipboard

Plugin does not log app's startup crashes.

Open Den-creator opened this issue 1 year ago • 17 comments

Describe the bug

If the application crashes upon startup, the Datadog Flutter SDK fails to upload a crash report to Datadog upon application restart. The crash report only appears after the next successful restart and usage of the app. This implies that if a crash occurs during app startup, we will never be aware that some users have experienced a crash. However, the documentation states:

If your application suffers a fatal crash, after your application restarts, the Datadog Flutter SDK uploads a crash report to Datadog. For non-fatal errors, the Datadog Flutter SDK uploads these errors with other RUM data.

Reproduction steps

Future<void> main() async {
  await DatadogSdk.runApp(
    DatadogConfiguration(
      clientToken: 'clientToken',
      env: 'DEV',
      site: DatadogSite.us1,
      nativeCrashReportEnabled: true,
      rumConfiguration: DatadogRumConfiguration(
        applicationId: 'applicationId',
        reportFlutterPerformance: true,
      ),
      loggingConfiguration: DatadogLoggingConfiguration(),
    ),
    TrackingConsent.granted,
    () async {
      await Firebase.initializeApp(
        options: DefaultFirebaseOptions.currentPlatform,
      );
      FirebaseCrashlytics.instance.crash();
      runApp(App());
    },
  );
}

SDK logs

No response

Expected behavior

After first crash on app's startup, if user opens app again - send crash report to data dog before second crash will happen again.

Affected SDK versions

2.3.0

Latest working SDK version

No response

Did you confirm if the latest SDK version fixes the bug?

No

Flutter Version

3.16.9

Setup Type

Flutter Application

Device Information

iOS 17.3.1, iPhone 12, Wifi, battery

Other relevant information

No response

Den-creator avatar Mar 13 '24 17:03 Den-creator

Hey @Den-creator,

This is accurate yes. There are actually two potential problems.

First, we initialize crash tracking as part of initialization. If your app crashes before then, we unfortunately won't catch it and can't send it. We are actually actively looking for ways to improve this that both allow configurability and comply with a user's tracking consent but that's the way it is for now.

Second, crashes are added to the next uploadable batch on app restart, but aren't sent immediately. If we never have time to send a batch, we won't be able to report the crash. I'll discuss with the team potential solutions for this, but the only "guaranteed" solution I can think of would be to prevent initialization from finishing until we've sent, or at least attempted to send, the crash report, which has the tradeoff of making that method significantly slower.

fuzzybinary avatar Mar 13 '24 18:03 fuzzybinary

@fuzzybinary, thank you for the reply. I agree with your statement:

The only "guaranteed" solution I can think of would be to prevent initialization from finishing until we've sent, or at least attempted to send, the crash report, which has the tradeoff of making that method significantly slower.

I was expecting such behavior from the SDK, as the current crash reporting mechanism is useless in cases when the app crashes on startup or almost immediately after. The tradeoff may not be significant compared to the issue we are currently facing. You could implement this behavior as optional, allowing developers to disable it if they prefer not to slow down the initialization process.

Could you please confirm whether you will be able to implement the above? If so, could you please provide an estimate of when it will be ready?

Den-creator avatar Mar 14 '24 15:03 Den-creator

Re-taging as an enhancement rather than a bug. Can you reach out to your CSM to raise a feature request? We've started talking internally about ways to implement this but unfortunately I can't provide a timeline.

fuzzybinary avatar Mar 14 '24 16:03 fuzzybinary

Can you reach out to your CSM to raise a feature request?

@fuzzybinary I will forward this to out team. Thanks !

Den-creator avatar Mar 15 '24 10:03 Den-creator

@fuzzybinary I suggest taking a look at how competing products work, like Sentry.

feinstein avatar Mar 22 '24 00:03 feinstein

@fuzzybinary how can we raise a feature request ? Should it be done via support button on data dog website ? image

Den-creator avatar Apr 22 '24 14:04 Den-creator

Hi @Den-creator,

Yes, if you don't have a CSM, submit it through Datadog support.

To keep everyone informed, we are looking a partial fix coming up soon. While we won't delay initialization, we will start sending data faster, which should capture more.

fuzzybinary avatar Apr 22 '24 18:04 fuzzybinary

Sorry, but where can I find CSM ?

Den-creator avatar Apr 23 '24 12:04 Den-creator

CSM is your Customer Success Manager. Not every client has one, but they would be your primary contact with Datadog if you do.

If you don't have one, you can use the Support chat and the feature request will be routed to the correct place.

fuzzybinary avatar Apr 23 '24 12:04 fuzzybinary

Thanks for reply !

Den-creator avatar Apr 23 '24 12:04 Den-creator

Hi folks,

datadog_flutter_plugin 2.5.0 has a change in the iOS and Android SDKs that will start sending data immediately on initialization. While this doesn't ensure that all crashes at startup will be caught and sent, it should improve the situation dramatically.

I'm going to keep this issue open so we can track potentially adding a "stall" during initialization to ensure crashes are sent, but that requires a bit more discussion on our side.

fuzzybinary avatar May 14 '24 19:05 fuzzybinary

Maybe we can have something more flexible.

The sdk might initialise and try to upload the last batch in a separate thread. If the app crashes, it may not finish the upload successfully.

So you might add a counter, increasing the counter at every attempt of upload and zeroing the counter upon success. If the counter hits 3 attempts, you stall the app's initialisation and assure the batch was uploaded.

My reasoning is that if the app is crashing immediately after it starts, then the user won't care if the app stalls for a little bit at start-up, and then crashes again. The app is bad anyway, the user can't use it, the SDK stalling the app won't make any negative impact, as the app is already unusable. While in a good app, the upload should be completed in a separate thread with no issues, not locking the app initialisation at all.

Thoughts?

feinstein avatar May 15 '24 23:05 feinstein

Hi folks,

datadog_flutter_plugin 2.5.0 has a change in the iOS and Android SDKs that will start sending data immediately on initialization. While this doesn't ensure that all crashes at startup will be caught and sent, it should improve the situation dramatically.

I'm going to keep this issue open so we can track potentially adding a "stall" during initialization to ensure crashes are sent, but that requires a bit more discussion on our side.

I'm facing the same issue even with version 2.6.0, but in web with application still running. Why this fix not applied to web?

I noticed when I call DatadogRum.getCurrentSessionId before DatadogRum.addErrorInfo it returns null even when SDK has finished initilization.

magno-castro avatar Jul 05 '24 18:07 magno-castro

@magno-castro This issue is specifically related to when crashes are sent, which are application terminating errors on mobile. These errors cannot be sent to Datadog when they happen, so we have to wait until next boot to send them. Web errors are usually sent immediately (or shortly after they occur) and so this fix is not really needed on web.

You're asking about when a session is created, and therefore when getCurrentSessionId will have a valid value. Web and mobile have slightly different approaches here from their underlying SDKs, but both are waiting for their first event, usually the first view or user interaction, before creating a new session. That's why you won't see a value for the current session until after you call addErrorInfo. That doesn't mean Datadog's SDK hasn't been doing work, it just means that Datadog hasn't started recording your current web visit yet.

If you're seeing 'lost data' on web (such as errors that you expect to see in Datadog that you're not) I would open a separate ticket, as it would certainly be caused by something other than what we're talking about here.

fuzzybinary avatar Jul 08 '24 13:07 fuzzybinary

@magno-castro This issue is specifically related to when crashes are sent, which are application terminating errors on mobile. These errors cannot be sent to Datadog when they happen, so we have to wait until next boot to send them. Web errors are usually sent immediately (or shortly after they occur) and so this fix is not really needed on web.

You're asking about when a session is created, and therefore when getCurrentSessionId will have a valid value. Web and mobile have slightly different approaches here from their underlying SDKs, but both are waiting for their first event, usually the first view or user interaction, before creating a new session. That's why you won't see a value for the current session until after you call addErrorInfo. That doesn't mean Datadog's SDK hasn't been doing work, it just means that Datadog hasn't started recording your current web visit yet.

If you're seeing 'lost data' on web (such as errors that you expect to see in Datadog that you're not) I would open a separate ticket, as it would certainly be caused by something other than what we're talking about here.

Thanks @fuzzybinary for the explanation. I am facing that 'lost data' issue, the app init Datadog SDK and set user infos, but right after that if application crashes seems like the error log doesnt be sent even when addErrorInfo is called and application continue running.

magno-castro avatar Jul 08 '24 13:07 magno-castro

@magno-castro Make sure you have a view active. If you're experiencing an issue early in the application before the RouterObserver has started your view, you can workaround this issue by calling startView manually immediately after initializing the SDK.

The mobile SDK functionality is a bit different here, as it does create a view during launch, which the web does not. This is partially by design and should just be noted as a functionality difference between the platforms.

fuzzybinary avatar Jul 08 '24 14:07 fuzzybinary

@magno-castro Make sure you have a view active. If you're experiencing an issue early in the application before the RouterObserver has started your view, you can workaround this issue by calling startView manually immediately after initializing the SDK.

The mobile SDK functionality is a bit different here, as it does create a view during launch, which the web does not. This is partially by design and should just be noted as a functionality difference between the platforms.

Thx again @fuzzybinary. Check if getCurrentSessionId returns null and call startView works for me, but I believe that is a important information to put on Setup topic, right?

magno-castro avatar Jul 08 '24 19:07 magno-castro