amplify-flutter icon indicating copy to clipboard operation
amplify-flutter copied to clipboard

Connectivity issues and "bad file descriptor" errors

Open damir-fell opened this issue 1 year ago • 26 comments

Description

Hello,

In our app we are experiencing the issue described in this dart issue thread: https://github.com/dart-lang/http/issues/197. Going through our code it seems that we are doing everything right in terms of stopping subscriptions and similar when app goes to background but for some reason it doesnt look like the connection pool is cleared.

We want to try to replace the http client Amplify is using with http_cupertino on ios as suggested by that thread. How can we do this? Do we need to do something within the amplify libraries itself or how is the http package provided?

Categories

  • [ ] Analytics
  • [ ] API (REST)
  • [X] API (GraphQL)
  • [ ] Auth
  • [ ] Authenticator
  • [ ] DataStore
  • [ ] Notifications (Push)
  • [ ] Storage

Steps to Reproduce

It is usually reproduced when app goes to foreground after being in background for extended period of time. What happens is that all queries start to fail and any retries or similar later does not recover the issue. The app has to be restarted in order to recover.

Screenshots

No response

Platforms

  • [X] iOS
  • [ ] Android
  • [ ] Web
  • [ ] macOS
  • [ ] Windows
  • [ ] Linux

Flutter Version

3.13.8

Amplify Flutter Version

1.7.0

Deployment Method

AWS CDK

Schema

No response

damir-fell avatar Nov 01 '24 12:11 damir-fell

Its related to this issue: https://github.com/aws-amplify/amplify-flutter/issues/3865

and we are using suggested comment here to address the issue: https://github.com/aws-amplify/amplify-flutter/issues/3865#issuecomment-1913662406

damir-fell avatar Nov 01 '24 12:11 damir-fell

Hello @damir-fell, I'm sorry you are experiencing these issues. Can you please try updating to the latest release 2.5.0 of amplify-flutter and report back if it solved the issue?

ekjotmultani avatar Nov 01 '24 21:11 ekjotmultani

Hi @ekjotmultani We can, however I have a couple of questions related to it:

  1. Can we use normal Amplify.API requests on the new major version? The V2 docs differs to the V1 and there is no explanation on how to configure setup with an existing AWS Appsync api and use the API plainly and not through Data.
  2. Do you expect this to help? Has there been improvements/changes in the library that you think can fix the issue?

It will take some time due to the breaking changes and I would like to resolve the issue as quickly as possible

damir-fell avatar Nov 03 '24 09:11 damir-fell

Hello @damir-fell

  1. Yes you can continue using normal Amplify.API requests I was able to run the following on both v1.8.0 and v2.5.0
final todo = Todo(
  id: uuid(),
  name: '',
);

try {
  final request = ModelMutations.create(todo);
  await Amplify.API.mutate(request: request).response;
} catch (e, st) {
  Logger.log('Create Exception', '$e - $st');
}

You can migrate from "Gen 1 v1" to "Gen 1 v2" using this guide.

  1. v2.5.0 includes a few bug fixes to our WebSockets such as how they handle resuming from a paused state. If you can provide me a code snippet for what's causing your issue I can try to reproduce the issue in v1.8.0 and check if it's working in v2.5.0.

tyllark avatar Nov 05 '24 01:11 tyllark

Thanks for the response, we will update the library and Flutter version and see.

About reproduction, its not really a code snippet or something I can share. Basically our app uses a bunch of subscriptions, queries and mutations and we seem to get problems when app has been in background for a long time and when put to foreground we get many "failed to lookup host" and "bad file descriptor" issues. It funnily enough doesn't look like subscriptions are the issues, but rather queries and mutations.

Here are a couple of examples

Error: UnknownException {
  "message": "unable to send GraphQLRequest to client.",
  "underlyingException": "POST https://.../graphql failed: HttpException: Bad file descriptor, uri = https://.../graphql"
},
 UnknownException {
  "message": "unable to send GraphQLRequest to client.",
  "underlyingException": "POST https://.../graphql failed: SocketException: Failed host lookup: '...' (OS Error: nodename nor servname provided, or not known, errno = 8)"

It is not internet issues as this happens with multiple users on random timestamps, and its occurring as long as app is alive. If app is killed and opened again the issues are gone.

We stop subscriptions when app is put to background and restore it when app is put to foreground but the issues keep reappearing either way as long as the app session is active.

damir-fell avatar Nov 05 '24 10:11 damir-fell

Hi,

Just adding my teams experience with this issue, we don't use subscriptions in our app and I receive this issue when trying to make a GraphQL Query from the background using the API. We do this to keep the app data up to date with the server so that the user is always greeted with the latest data from the database. I've had to move off Datastore in preparation of moving to Gen2 and issues with syncing from our lambda function that populates data in the database.

We are already using amplify Gen1v2 and Amplify API version 2.5.0. I use the background_fetch v1.3.5 to perform app tasks in the background. Interestingly, I am able to upload files to S3 in the same background process but when querying with the Amplify API package I receive the below error.

Kz: UnknownException {
  "message": "unable to send GraphQLRequest to client.",
  "underlyingException": "POST https://REDACTED.appsync-api.ap-southeast-2.amazonaws.com/graphql failed: HttpException: Bad file descriptor, uri = https://REDACTED.appsync-api.ap-southeast-2.amazonaws.com/graphql"
}

A simple reproduction of my set-up is below, background_fetch requires a bit of set-up in the iOS side to get running so if there's fixes I can try I'm happy to help.

import 'package:amplify_api/amplify_api.dart';
import 'package:amplify_flutter/amplify_flutter.dart';
import 'package:background_fetch/background_fetch.dart';
import 'package:connectivity_plus/connectivity_plus.dart';
import '../models/ModelProvider.dart';

Future<List<Score?>?> fetchLatestScores() async {
  GraphQLRequest<PaginatedResult<Score>> request = ModelQueries.list<Score>(
      Score.classType,
      limit: 1000
  );
  try {
    GraphQLResponse<PaginatedResult<Score>> response = await Amplify.API
        .query(request: request)
        .response;
    return response.data?.items;
  } catch (err) {
    safePrint("Received error when requesting latest Scores");
    safePrint(err);
    return null;
  }
}


// Initialize background fetch
Future<void> initBackgroundFetch() async {
  int status = await BackgroundFetch.configure(
    BackgroundFetchConfig(
      minimumFetchInterval: 30,
      stopOnTerminate: false,
      enableHeadless: true,
      requiresBatteryNotLow: true,
      requiresCharging: false,
      requiresStorageNotLow: false,
      requiresDeviceIdle: false,
      requiredNetworkType: NetworkType.ANY,
    ),
        (String taskId) async {
          final connectivityResult = await (Connectivity().checkConnectivity());
          if (connectivityResult.contains(ConnectivityResult.wifi) ||
              connectivityResult.contains(ConnectivityResult.mobile)) {
            try {
              //Uploads to S3 using Amplify succeed successfully in the background
              //That's what this below function does which I have removed for replication
              // FileStorage().uploadPendingFiles();
              final newScores = await fetchLatestScores();
              safePrint("Received newScores: $newScores");
            } catch (err) {
              safePrint(
                  "Received error when trying to fetch scores in the background");
              safePrint(err);
            }
            BackgroundFetch.finish(taskId);
          }
        },
        (String taskId) async {
      safePrint("[BackgroundFetch] TASK TIMEOUT taskId: $taskId");
      BackgroundFetch.finish(taskId);
    },
  );
  safePrint("[BackgroundFetch] configured successfully: $status");
}

natebytes avatar Dec 17 '24 14:12 natebytes

Hi @nathcakes, apologies for the delayed response, this is interesting and I'm sorry you are facing this issue, I'll attempt to reproduce this on my end, thanks for the sample given it will be helpful!

ekjotmultani avatar Dec 26 '24 18:12 ekjotmultani

Hi @ekjotmultani,

We have updated our app to the newest version and are still experiencing the same issues.

What should we try now? Any logging we can enable to help get to bottom of the issue? Please assist.

damir-fell avatar Jan 06 '25 14:01 damir-fell

Hi @damir-fell , I'm sorry that updating didn't resolve this issue. We will continue investigating and get back to you with an update

ekjotmultani avatar Jan 13 '25 17:01 ekjotmultani

@ekjotmultani FYI we have made a new change on our side that we are testing at the moment that might fix or at least improve the issue. We recently got a error report saying that application had consumed the maximum 300 subscriptions allowed by Amplify. It seems like the suggested approach in this comment does not always work and Amplify kept the subscriptions open internally.

Will let you know how testing goes.

damir-fell avatar Jan 14 '25 12:01 damir-fell

@damir-fell please do, if it turns out to be an error with the subscription count that will help us identify the root cause

ekjotmultani avatar Jan 16 '25 20:01 ekjotmultani

@ekjotmultani Seems it did not help, we still get the same issues.

I am not even sure its related to subscriptions. Sometimes the app is opened and we hit our graphql endpoint to refresh application data and sometimes all api calls just fails with "Failed: HttpException: Bad file descriptor" errors. What does this error mean and where does it come from?

damir-fell avatar Jan 24 '25 12:01 damir-fell

@damir-fell the error is coming from the os (iOS in this case) and is related to some stale or invalid file descriptor (an ID for all the files on the os), most likely for a socket connection. This is something our fix above might have remedied, however it seems this particular issue persists even after this fix since the errors are not always coming when the app is resuming. The next lines of thinking are to investigate how the flutter code opens sockets on the native layer, and see if anything is causing file descriptors to get lost, or perhaps more likely just problems with the socket/ connection itself. This issue is quite sticky it seems so we appreciate the extra effort everyone has put into providing when/ where they encounter this error and the relevant snippets of code and logs.

Changing the http client to cupertino_http as described in the linked dartlang issue is something I'm looking at, I'll get back to you with some steps on how to do that

@nathcakes how consistently do you encounter this exception?

ekjotmultani avatar Jan 31 '25 20:01 ekjotmultani

@ekjotmultani Thanks for the explanation.

Any updates here? This is a pain for us and would say the biggest problem with our application so we are very eager to fix it. At least instructions on how to change the http package to cupertino_http to test if that helps at all

damir-fell avatar Feb 13 '25 13:02 damir-fell

Hi @damir-fell, I'm sorry this is such a pain, for now we are tracking it as a feature request to at least get the cupertino implementation in, once we have that I will update you so that hopefully this issue gets resolved

ekjotmultani avatar Feb 13 '25 22:02 ekjotmultani

Sorry @damir-fell, I should mention so that you can try as well in the meantime that it is possible to implement cupertino http by extending AWSCustomHttpClient and passing it as a parameter to APIPluginOptions. For further reference see https://github.com/aws-amplify/amplify-flutter/blob/main/packages/aws_common/lib/src/http/aws_http_client.dart .

ekjotmultani avatar Feb 17 '25 22:02 ekjotmultani

@ekjotmultani Got it, we will try it and I will get back to you if any questions 🤞

damir-fell avatar Feb 18 '25 14:02 damir-fell

Sorry to trigger the auto-tagging, I posted a comment and then immediately realized my mistake. Trying to get the Cupertino HTTP client done, will share once I've got it working.

natebytes avatar Mar 26 '25 21:03 natebytes

@ekjotmultani Hi! We did the cupertino client swap but honestly there is not much difference. We still get the bad file descriptor issues.

Any other suggestions from your side what we should try?

damir-fell avatar Mar 31 '25 18:03 damir-fell

Hello @damir-fell, sorry for the delay. I will attempt to reproduce the issue on my side and get back to you.

tyllark avatar Apr 07 '25 16:04 tyllark

Hey,

Just wanted to update with my experience since swapping to the CupertinoClient & CronetClient. I'm no longer receiving bad file descriptors but I do regularly receive the following errors when the application is no longer active. Files are still always successfully uploaded to S3 in the background.

Cronet Client (Android) is indicating an invalid hostname (Error Code -103):

ClientException: Cronet exception: m.ks: Exception in CronetUrlRequest: net::ERR_CONNECTION_ABORTED, ErrorCode=11, InternalErrorCode=-103, Retryable=false, uri=https://REDACTED.appsync-api.ap-southeast-2.amazonaws.com/graphql

Cupertino Client (iOS):

UnknownException {
  "message": "unable to send GraphQLRequest to client.",
  "underlyingException": "ClientException: The network connection was lost., uri=https://REDACTED.appsync-api.ap-southeast-2.amazonaws.com/graphql"
}

I thought it might help/interest you to see a summary of my logs communicating with the network. This is pretty indicative of what's going on for me, sometimes it will be just fine, then it will fail for some time and then return to succeeding again. This history is all on iOS with the app in the background.

13:14:32 - Files are uploaded to S3 Successfully
13:16:19 - ERROR: Network Connection Lost (Trying to communicate to GraphQL API)
13:20:25 - Files are uploaded to S3 Successfully
13:23:35 - ERROR: Network Connection Lost (Trying to communicate to GraphQL API) 
13:57:01 - Files are uploaded to S3 Successfully
13:59:07 - SUCCESS: Received new data from GraphQL API
14:14:15 - Files are uploaded to S3 Successfully
14:14:16 - Received an error back from DynamoDB via the GraphQL API (error in my code, but indicates connection was still good) 
14:15:20 - ERROR: Network Connection Lost (Trying to communicate to GraphQL API)

I struggled to implement the custom client on my own but got a hand from AI so this is what I'm using and it appears to function correctly: Cronet/Cupertino AWSClient

natebytes avatar Apr 10 '25 14:04 natebytes

Hello @nathcakes, thank you for the update and providing your cupertino_http implementation. The GraphQL error shouldn't be related to S3, but rather Amplify.API.subscribe or Amplify.Datastore. We are still investigating this issue and will look into the Cronet Client and Cupertino Client exceptions as well.

tyllark avatar Apr 15 '25 16:04 tyllark

Any news here?

damir-fell avatar Apr 28 '25 07:04 damir-fell

Hi @damir-fell, we do not have an updates at this time. We are still investigating and will provide any updates here.

tyllark avatar Apr 30 '25 17:04 tyllark

Is it possible to share any kind of progress or estimate when you will have some results?

damir-fell avatar May 07 '25 22:05 damir-fell

I was also able to get an implementation of the Cupertino client and after some basic testing I didn't notice any errors, but this was a quick check and I didn't do anything robust enough yet to try and pinpoint the root cause yet. As I make more progress on this I will continue to share here, next I will try to reproduce the errors from when the app is inactive

ekjotmultani avatar May 12 '25 13:05 ekjotmultani