grpc-dart icon indicating copy to clipboard operation
grpc-dart copied to clipboard

KeepAlive Ping Error Handling

Open lyokato opened this issue 1 year ago • 3 comments

The following is an excerpt of the code that sets up KeepAlive written in lib/src/client/http2_connection.dart

      if (options.keepAlive.shouldSendPings) {
        keepAliveManager = ClientKeepAlive(
          options: options.keepAlive,
          ping: () {
            if (transport.isOpen) {
              transport.ping();
            }
          },
          onPingTimeout: () => transport.finish(),
        );
        transport.onFrameReceived
            .listen((_) => keepAliveManager?.onFrameReceived());
      }

We found that the above code occasionally threw an exception when calling the transport.ping(). Sometimes an HTTP/2 Connection error occurred, starting from the http2 library side.

Therefore, we have applied the following patch to prevent the exception from flying.

              transport.ping().catchError(() {
                shutdown();
              });

I am not sure if this is the appropriate way to fix the problem, but I will report it to you.

lyokato avatar Sep 26 '24 07:09 lyokato

Thanks for reporting! I will take a look. Do you have any more information on the environment where this occurs? I haven't seen this happening on my machine.

mosuem avatar Sep 26 '24 07:09 mosuem

I was using the following

  • Use with request/response style APIs.
  • However, to avoid the cost of establishing a connection each time, once a channel is created, it is used repeatedly. Send multiple requests through a single channel.
  • For this reason, KeepAlive is set for the channel, and the pingInterval is set to 30 seconds.
  • The part where a request is sent and the response is received is enclosed in try/catch, so that GrpcError can be caught.

When we had users using the app in this situation, errors were reported to Firebase Crashlytics and Bugsnag.

HTTP/2 error: Connection error: Connection is being forcefully terminated. (errorCode: 10)

This is what was flowing to PlatformDispatcher.onError in the Flutter application. This means that an uncaught error is occurring somewhere.

The place where the request is sent is enclosed in try/catch as described above. This is why I thought the KeepAlive ping was suspicious.


Then we looked at where this error originated in the first place.

https://github.com/dart-lang/http2/blob/master/lib/src/connection.dart

There are several places in this code where the following calls are made

 _terminate(ErrorCode.CONNECT_ERROR, causedByTransportError: true);

The following is implemented in the _terminate method called here.

      var exception = TransportConnectionException(
          errorCode, 'Connection is being forcefully terminated.');

      // Close all streams & stream queues
      _streams.terminate(exception);

      // Close the connection queues
      _incomingQueue.terminate(exception);
      _outgoingQueue.terminate(exception);

      _pingHandler.terminate(exception);
      _settingsHandler.terminate(exception);

From the error message, we could see that it was this very exception that occurred this time.

We found that this exception was being passed to streams, incomingQueue, outgoingQueue, pingHandler, and settingsHandler.

Since we thought ping was suspicious, we checked the code of this _pingHandler and tracked how it was being used from the grpc-dart side.

As a result, we found that no catchError was made at the point of the first patch.

After applying the patch, I have had users use the app, and the error is no longer being sent to PlatformDispatcher.onError.


We do not know how to reproduce this reliably.

We are using grpc via an AWS load balancer. It is possible that the problem occurs when we hang a connection for a long time via the load balancer, but we have not yet investigated that in detail. We plan to investigate further.

lyokato avatar Sep 27 '24 01:09 lyokato

Thanks for the write-up, this is helpful.

mosuem avatar Sep 27 '24 13:09 mosuem