grpc-dart icon indicating copy to clipboard operation
grpc-dart copied to clipboard

Open connections grow when connectionTimeout is used

Open calmh opened this issue 4 years ago • 3 comments

When connectionTimeout is set (including the default of 50 minutes) and long running streaming requests are run, the number of established HTTP2 connections will increase be one, first on connectionTimeout triggering and then every so often.

I'm narrowing it down more specifically, but roughly the root cause seems to be that we set the transport.onActiveStateChanged callback on the connection when it's established. There will be a running RPC on the connection, so it's active. At some point we realize we want to recycle the connection so we set state to idle, make a new connection, and finish the old one, state becomes ready. Then, the RPC is cancelled on the old connection, it makes the callback with active=false, state gets set to idle, and we reconnect again. We now have two connections. Next interval this repeats, and we get three connections, and so on.

Probably the callback should be removed on connections that get intentionally recycled.

Version 3.0.0

Repro steps

For easier repro, set a short connectionTimeout.

ClientChannel(
      host,
      port: port,
      options: ChannelOptions(
        credentials: ChannelCredentials.secure(certificates: caCertificate),
        connectionTimeout: Duration(minutes: 5),
        idleTimeout: Duration(minutes: 2),
        userAgent: ua,
      ),
    );

Run repeated long streaming RPC calls towards some server.

Observe tcp connections using netstat.

Expected result: One connection maintained, recycled at the specified interval.

Actual result: Every interval we gain a new connection.

calmh avatar Sep 02 '21 20:09 calmh

Oh, there is also a second instance of this problem left after unsetting the callback; when establishing the connection we rig

_transportConnector.done.then((_) => _abandonConnection());

which means that after establishing a new connection, the old connection will at some point terminate and call _abandonConnection which sets state back to idle and causes a new connection to be dialed, even though we already had a new one just created. This all seems a bit flaky, frankly. There's also a logic error in _abandonConnection itself,

https://github.com/grpc/grpc-dart/blob/7cced9282a99de50d9fd08ebade90835c79818f8/lib/src/client/http2_connection.dart#L273-L276

which is not too serious but might set us back to idle or transientFailure from shutdown as the condition is never true.

Maybe these callbacks could pass the connection that caused them to fire, and the receiver compares that to the currently active one. If it's not the currently active one we can disregard the callback... (or maybe there's some more elegant way to cancel the done on the future)

calmh avatar Sep 02 '21 20:09 calmh

@mraleph this should be reopened as you reverted the fix.

calmh avatar Dec 14 '22 12:12 calmh