dd-sdk-flutter icon indicating copy to clipboard operation
dd-sdk-flutter copied to clipboard

No implementation found for method addError on channel datadog_sdk_flutter.rum

Open androidmitry opened this issue 1 year ago • 35 comments

Stack trace

Fatal Exception: io.flutter.plugins.firebase.crashlytics.FlutterError: MissingPluginException(No implementation found for method addError on channel datadog_sdk_flutter.rum) at MethodChannel._invokeMethod(platform_channel.dart:332) at ._willHandleError(helpers.dart:14)

Reproduction steps

Add the datadog_flutter_plugin package, release to App Store

Volume

0,0021 (1-2 users per day)

Affected SDK versions

2.4.0

Does the crash manifest in the latest SDK version?

Yes

Flutter Version

3.19.5

Setup Type

Flutter Application

Device Information

OS - Android per version: Android 12 - 88% Android 10 - 7% Android 14 - 3% Android 13 - 2%

per device: Samsung - 93% Oneplus - 7%

Other relevant information

Device states: background 60%

androidmitry avatar Apr 23 '24 14:04 androidmitry

Hi @androidmitry ,

Thanks for the report, I'll look into this as soon as I can.

Can you give me anymore information about a possible reproduction? Have you been able to reproduce locally at all? Is there anything strange about your setup that might be disconnecting the MethodChannel from our plugin? We tend to wrap every call we make to try to avoid crashes, so I'm very concerned that this is causing a crash....

fuzzybinary avatar Apr 23 '24 14:04 fuzzybinary

Hi @fuzzybinary , unfortunately thats all information I have so far. I wasn't able to reproduce it. We had some custom platform code, but it was removed. Whats interesting is that number of reports is decreasing. I will update the issue if crash goes away.

androidmitry avatar Apr 24 '24 11:04 androidmitry

@androidmitry Yeah if you can keep me posted I would appreciate it.

I'm seen issues in the past where the method channel can get disconnected from the plugin, but I've fixed those, and most threw errors in the native layer, not Dart.

fuzzybinary avatar Apr 24 '24 18:04 fuzzybinary

This issue is also occurring in version 2.1.0, and we have encountered the MissingPluginException from the Android channels datadog_sdk_flutter.rum and datadog_sdk_flutter.logs in the production release. Due to consecutive RUM events, the error count is excessively high. Below are some error messages we've received:

  1. MissingPluginException(No implementation found for method addError on channel datadog_sdk_flutter.rum)
  2. MissingPluginException(No implementation found for method createLogger on channel datadog_sdk_flutter.logs)
  3. MissingPluginException(No implementation found for method stopView on channel datadog_sdk_flutter.rum)

nirmal0707 avatar May 09 '24 08:05 nirmal0707

Hi @nirmal0707,

I'm actively investigating this, but I haven't had much reproducing. Do you happen to have any steps to reproduce, or anything you can tell me about your app before / after you started seeing the errors?

fuzzybinary avatar May 09 '24 18:05 fuzzybinary

Hi @fuzzybinary ,

This issue was not reproducible but began occurring when we migrated our codebase to Flutter 3.16.4, three months ago. Previously, we were using version 1.5.1, and the Flutter upgrade required us to move the package version to 2.1.0, resulting in this issue arising for some users in production.

nirmal0707 avatar May 10 '24 06:05 nirmal0707

Alright, thanks @nirmal0707, That may help me track down the issue.

fuzzybinary avatar May 10 '24 12:05 fuzzybinary

Hi folks -- a few questions for everyone to see if I can try to diagnose this:

  • Is anyone using background tasks or foreground services, or the flutter_background_service?
  • Are you using push notifications or a push notification service like firebase_cloud_messaging? Do these errors tend to spike immediately after a push notification is sent out?
  • Is anyone using Flutter in an add to app scenario, or using attachToExisting in the SDK?
  • Does GeneratedPluginRegistrant.java enclose all the plugins in a try/catch block?
  • Do the MissingPluginException errors correlate with any other errors around the same time?

Sorry this is taking so long but I am having a really hard time reproducing, even when forcing certain error states, and. comparing with Crashlytics, we perform the registration and de-registration of our method channels the same way they do, so I'm not sure how or why they'd catch the errors and we don't.

fuzzybinary avatar May 10 '24 15:05 fuzzybinary

Another question as I continue to investigate -- Does anyone have any customizations of their FlutterActivity? Overriding onCreate, configureFlutterEngine, onDestroy or any other methods?

fuzzybinary avatar May 15 '24 15:05 fuzzybinary

For us crash reports started coming when we upgraded flutter from 3.16.9 to 3.19.5

Is anyone using background tasks or foreground services

We have foreground service but we don't use flutter_background_service package. Also according to breadcrumb events attached to crash it usually happens in foreground.

Are you using push notifications or a push notification service like firebase_cloud_messaging ? Do these errors tend to spike immediately after a push notification is sent out ?

Yes. No.

Is anyone using Flutter in an add to app scenario, or using attachToExisting in the SDK?

No

Does GeneratedPluginRegistrant.java enclose all the plugins in a try/catch block?

Yes

Do the MissingPluginException errors correlate with any other errors around the same time?

Checked several users and no other issues were reported around same time

Does anyone have any customizations of their FlutterActivity?

We do, I will double check them.

androidmitry avatar May 16 '24 12:05 androidmitry

My error message is a bit different MissingPluginException(No implementation found for method reportLongTask on channel datadog_sdk_flutter.rum)

These are my Sentry logs:

image

Then a bunch of:

image

And then:

image

Maybe you are not handling the destroyed lifecycle correctly? Or another plugin is interfering?

feinstein avatar May 20 '24 00:05 feinstein

Hi @feinstein, thanks for the additional information. All of the MissingPluginException issues are related, regardless of the method channel named and the method recorded, so any additional info is helpful.

The FlutterJNI error is interesting, that wouldn't be us so I'm very curious what might cause that, and curious if they're related.

We actually don't handle activity lifecycle at all, instead relying on Flutter's onAttachedToEngine and onDetachedFromEngine, which is what makes this error so frustrating, as those should be triggered properly when Flutter itself starts and stops.

Have you been able to reproduce locally at all?

fuzzybinary avatar May 20 '24 12:05 fuzzybinary

AFAIK Flutter JNI is the Java interop for connecting the C++ Flutter engine to the Android app.

Maybe Flutter is not triggering the engine's life cycle correctly to your lib.

feinstein avatar May 20 '24 12:05 feinstein

We were not able to reproduce it locally. We made some tiny changes to our FlutterActivity, I will report if it helped.

androidmitry avatar May 20 '24 12:05 androidmitry

@fuzzybinary just noting that we are still experiencing the issue mentioned in https://github.com/DataDog/dd-sdk-flutter/issues/552 (which I believe is the same issue being tracked here) despite removing the native cruft I referred to in my last comment on that issue. IIRC I am able to reproduce this in our application fairly consistently. If I have a sec today I'll play around and see if I can reproduce. According to another engineer on my team we're seeing ~249k instances of this issue per week. We've had to filter these issues out of our crash reporting to avoid going beyond our contracted threshold 🙃

btrautmann avatar May 21 '24 14:05 btrautmann

STR would would be ridiculously helpful. If I can reproduce I can likely get it fixed and out with the next version ASAP.

fuzzybinary avatar May 21 '24 15:05 fuzzybinary

@fuzzybinary Just chimining in again on @nirmal0707 behalf, looking at our Sentry error logs, we also see a large number of lifecycle events being reported in quick succession in the error events for this:

image

And the above screenshot is only about a quarter of the pause/resume breadcrumb events in that particular Sentry error event.

Not sure if thats relevant, but perhaps this rapid set of lifecycle events causes some sort of race condition in the Datadog plugins setup code?

maks-ucs avatar May 22 '24 00:05 maks-ucs

This looks weird, so many transitions in under 1 second.

What makes me exclude a Flutter error is that only the DD plugin is raising this exception.... but on a second thought, few packages would trigger a method channel call when the app is being destroyed

feinstein avatar May 22 '24 06:05 feinstein

Another question from research:

Is anyone suffering from this error still using runZonedGuarded over PlatformDispatcher.instance.onError? (If you are using Datadog.runApp we do not use runZonedGuarded)

I'm looking for commonalities here, since I cannot reproduce with any example I have, but all of my examples use PlatformDispatcher.

fuzzybinary avatar May 22 '24 16:05 fuzzybinary

We use runZonedGuarded, is it deprecated ? We set PlatformDispatcher.instance.onError as well

androidmitry avatar May 22 '24 16:05 androidmitry

PlatformDispatcher.instance.onError is preferred and the two do essentially the same thing.

I'm going to do more research but I'm curious if the new zone creation is occasionally bypassed by backgrounding / foregrounding.

fuzzybinary avatar May 22 '24 16:05 fuzzybinary

Tests on my side related to runZonedGuarded don't duplicate the issue unfortunately.

Next question -- is everyone experiencing this potentially using multiple Flutter engines or booting engines themselves for any reason? There is a potentially related Flutter issue if so. Doing a quick scan of the issue it's possible we might be able to fix this on the Datadog side, but knowing would help me focus efforts.

fuzzybinary avatar May 22 '24 19:05 fuzzybinary

Thanks for your continued efforts on this @fuzzybinary ! 👍

For our app we are not using multiple Flutter engines and we do use runZonedGuarded, though it seems thats likely not the source of the issue from your last comment.

maks-ucs avatar May 23 '24 00:05 maks-ucs

I am also using runZonedGuarded, I initialize Sentry, then DataDog. Here's how I initialize it:

Future<void> setupDatadog() async {
  final configuration = DatadogConfiguration(
    clientToken: 'mytoken1234',
    env: appFlavor ?? 'no-flavour',
    site: DatadogSite.us5,
    nativeCrashReportEnabled: true,
    loggingConfiguration: DatadogLoggingConfiguration(),
    rumConfiguration: DatadogRumConfiguration(
      applicationId: 'my-app-id-1234',
    ),
  );

  final originalOnError = FlutterError.onError;
  FlutterError.onError = (details) {
    DatadogSdk.instance.rum?.handleFlutterError(details);
    originalOnError?.call(details); // This allows me to not override other listeners, like Sentry.
  };
  final platformOriginalOnError = PlatformDispatcher.instance.onError;
  PlatformDispatcher.instance.onError = (e, st) {
    DatadogSdk.instance.rum?.addErrorInfo(
      e.toString(),
      RumErrorSource.source,
      stackTrace: st,
    );
    return platformOriginalOnError?.call(e, st) ?? false;
  };

  await DatadogSdk.instance.initialize(configuration, TrackingConsent.granted);
  DatadogSdk.instance.updateConfigurationInfo(LateConfigurationProperty.trackErrors, true);
}

That function is called inside a runZonedGuarded, after await SentryFlutter.init and WidgetsFlutterBinding.ensureInitialized();.

feinstein avatar May 23 '24 06:05 feinstein

Hi folks - we still cannot reproduce this issue unfortunately. My guess is that this is some sort of race condition on the platform channel during backgrounding, where we are attempting to send view or log events while the app is backgrounding on Android.

However, I will say we do know that even though Sentry / Crashlytics report this as a “Fatal” error, it does not result in the application terminating, and is silent to the user. I verified this by essentially “force disconnecting” the method channel during testing and seeing what the response is from Flutter. This means that users are not seeing a degraded app experience because of this issue.

This doesn’t mean we don’t take the issue seriously, and if anyone can provide us with reproduction steps that would be incredibly helpful.

fuzzybinary avatar Jun 04 '24 07:06 fuzzybinary

Maybe contact the flutter team and ask them what might be causing this?

feinstein avatar Jun 04 '24 08:06 feinstein

I've gone through some of the less formal channels (Discord, for example), but I may raise a github issue and see if it gets more attention.

fuzzybinary avatar Jun 04 '24 09:06 fuzzybinary

All previous changes I made didn't help. We are planning a flutter sdk upgrade. I will post here if it helps.

androidmitry avatar Jun 04 '24 09:06 androidmitry

@fuzzybinary I've been unable to give this attention due to some other pressing work, but I wanted to respond to:

Next question -- is everyone experiencing this potentially using multiple Flutter engines or booting engines themselves for any reason? There is a https://github.com/flutter/flutter/issues/103483 if so. Doing a quick scan of the issue it's possible we might be able to fix this on the Datadog side, but knowing would help me focus efforts.

A coworker of mine was toying around with this and was able to confirm that there's a case where a user taps a deep link and in doing so a new Flutter engine gets created (I think because of some code we have on the native side, I doubt that this is default Flutter behavior). As a result, our main function is called again which calls the code that would initialize Datadog twice. My hunch (without really looking at the code on either side, I'm just leaving this comment between tasks) is that the move to a singleton on your end made this bug which was already occurring more obvious (because of all the errors we're seeing).

Obviously we have more triaging and likely some fixes to put in on our end, but I did want to (cautiously) confirm your hypothesis that the 2 engine thing may be one cause of the issue folks are seeing.

btrautmann avatar Jun 13 '24 16:06 btrautmann

Thanks @btrautmann, that's really good information to have. I'm not sure if all of these issues are related to multiple Flutter engines, but I feel like its possible there are situations I don't know about that could legitimately create a second Flutter engine.

I'll have to think about how we can support that situation, but knowing that I can create a fake situation that artificially creates multiple engines and test that my solution works.

I'll try to get a solution for you in the next few weeks.

fuzzybinary avatar Jun 14 '24 12:06 fuzzybinary