grpc-dotnet icon indicating copy to clipboard operation
grpc-dotnet copied to clipboard

Endless `OperationCancelledException` errors in `GrpcCall` after Cancellation in `SocketConnectivitySubchannelTransport.TryConnectAsync`

Open kolonist opened this issue 3 months ago • 3 comments

Hi, @JamesNK

We detected the following endless errors from several pods when they are trying to perform Grpc calls to several endpoints: изображение

Exception itself is always the same:

Source context: Grpc.Net.Client.Internal.GrpcCall

Status message: Error connecting to subchannel.

Exception:
System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at System.Threading.CancellationToken.ThrowIfCancellationRequested()
   at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|285_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt)

Luckily we could capture start of this in another service in Debug:

изображение

After that all requests to URL related to Subchanel id 19-2 (look at the picture above) started to throw the following errors:

изображение

(don't be confused with log messages order because messages with the same timestamp can follow in mixed order)

We could reproduce this error with the following code:

Main routine:

// here is service initialization, client side balancing used, but there is only one available endpoint
// we use latest grpc-dotnet from main branch

for (var i = 0; i < 200; i++)
{
    try
    {
        logger.LogWarning("##### START LOOP");

        var client = host.Services.GetRequiredService<GrpcSomeServiceInterface.GrpcSomeServiceInterfaceClient>();
        var request = new GrpcSomeRequest
        {
            Content = "Random message"
        };

        logger.LogWarning("##### BEFORE Unary call");

        var response = await client.UnaryCallAsync(request);

        logger.LogWarning("##### Response: {Response}", response.Content);
    }
    catch (Exception ex)
    {
        logger.LogError(ex, "##### SOME ERROR");
    }
    finally
    {
        logger.LogWarning("##### BEFORE Delay");
        await Task.Delay(150 * 1000);
        logger.LogWarning("##### END LOOP");
    }
}

In SocketConnectivitySubchannelTransport.TryConnectAsync() we synthetically cancel context as it was cancelled from the outside:

    public async ValueTask<ConnectResult> TryConnectAsync(ConnectContext context, int attempt)
    {
        // ensure we do it only once
        if (!_wasCancellation)
        {
            _logger.LogWarning("##### context.CancelConnect();");

            context.CancelConnect();

            // ensure we do it only once
            _wasCancellation = true; 
        }

        _logger.LogWarning(
            "##### TryConnectAsync context.CancellationToken.IsCancellationRequested = {CancellationRequested}",
            context.CancellationToken.IsCancellationRequested);

        Debug.Assert(CurrentEndPoint == null);

        // continuation...

The output is the following:

[16:42:12 WRN ]  ##### START LOOP
[16:42:12 WRN ]  ##### BEFORE Unary call
[16:42:12 DBG ]  Starting gRPC call. Method type: 'Unary', URI: 'http://10.124.143.49:82/SomeName.SomeAnotherName.SomeService.V1.GrpcSomeServiceInterface/UnaryCall'.
[16:42:12 DBG ]  Subchannel id '2-1' created with addresses: 10.124.143.49:82
[16:42:12 WRN ]  ##### SocketConnectivitySubchannelTransport Created
[16:42:12 VRB ]  Subchannel id '2-1' state changed registration '2-1-1' created.
[16:42:12 DBG ]  Channel picker updated.
[16:42:12 VRB ]  Subchannel id '2-1' connection requested.
[16:42:12 DBG ]  Subchannel id '2-1' state changed to Connecting. Detail: 'Connection requested.'.
[16:42:12 VRB ]  Subchannel id '2-1' executing state changed registration '2-1-1'.
[16:42:12 VRB ]  Processing subchannel id '2-1' state changed to Connecting. Detail: 'Connection requested.'.
[16:42:12 DBG ]  Channel state updated to Connecting.
[16:42:12 VRB ]  Pick started.
[16:42:12 DBG ]  Picked queued.
[16:42:12 VRB ]  Waiting for a new picker.
[16:42:12 VRB ]  Subchannel id '2-1' connecting to transport.

[16:42:12 WRN ]  ##### context.CancelConnect();
[16:42:12 WRN ]  ##### TryConnectAsync context.CancellationToken.IsCancellationRequested = True

[16:42:12 VRB ]  Subchannel id '2-1' connecting socket to 10.124.143.49:82.
[16:42:12 DBG ]  Subchannel id '2-1' error connecting to socket 10.124.143.49:82.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
[16:42:12 DBG ]  Subchannel id '2-1' state changed to TransientFailure. Detail: 'Error connecting to subchannel.'.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
[16:42:12 VRB ]  Subchannel id '2-1' executing state changed registration '2-1-1'.
[16:42:12 VRB ]  Processing subchannel id '2-1' state changed to TransientFailure. Detail: 'Error connecting to subchannel.'.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
[16:42:12 DBG ]  Channel state updated to TransientFailure.
[16:42:12 DBG ]  Channel picker updated.
[16:42:12 VRB ]  Subchannel id '2-1' connect canceled.
[16:42:12 VRB ]  Pick started.
[16:42:12 DBG ]  Picked failure with status: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Threading.Tasks.TaskCanceledException: A task was canceled.")
[16:42:12 DBG ]  Error starting gRPC call.
[16:42:12 INF ]  Call failed with gRPC error status. Status code: 'Unavailable', Message: 'Error connecting to subchannel.'.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
[16:42:12 DBG ]  Finished gRPC call.
[16:42:12 DBG ]  gRPC call canceled.

[16:42:12 ERR ]  ##### SOME ERROR
Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Threading.Tasks.TaskCanceledException: A task was canceled.")
 ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
   --- End of inner exception stack trace ---
   at Grpc.Net.Client.Balancer.Internal.ConnectionManager.PickAsync(PickContext context, Boolean waitForReady, CancellationToken cancellationToken) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/ConnectionManager.cs:line 341
   at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/BalancerHttpHandler.cs:line 117
   at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Internal/GrpcCall.cs:line 502
   at Grpc.Net.Client.Internal.GrpcCall`2.GetResponseHeadersCoreAsync() in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Internal/GrpcCall.cs:line 300
[16:42:12 WRN ]  ##### BEFORE Delay
[16:44:42 WRN ]  ##### END LOOP

[16:44:42 WRN ]  ##### START LOOP
[16:44:42 WRN ]  ##### BEFORE Unary call
[16:44:42 DBG ]  Starting gRPC call. Method type: 'Unary', URI: 'http://10.124.143.49:82/SomeName.SomeAnotherName.SomeService.V1.GrpcSomeServiceInterface/UnaryCall'.
[16:44:42 VRB ]  Pick started.

[16:44:42 DBG ]  Picked failure with status: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Threading.Tasks.TaskCanceledException: A task was canceled.")

[16:44:42 DBG ]  Error starting gRPC call.
[16:44:42 INF ]  Call failed with gRPC error status. Status code: 'Unavailable', Message: 'Error connecting to subchannel.'.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
[16:44:42 DBG ]  Finished gRPC call.
[16:44:42 DBG ]  gRPC call canceled.

[16:44:42 ERR ]  ##### SOME ERROR
Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Threading.Tasks.TaskCanceledException: A task was canceled.")
 ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
   --- End of inner exception stack trace ---
   at Grpc.Net.Client.Balancer.Internal.ConnectionManager.PickAsync(PickContext context, Boolean waitForReady, CancellationToken cancellationToken) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/ConnectionManager.cs:line 341
   at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/BalancerHttpHandler.cs:line 117
   at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Internal/GrpcCall.cs:line 502
   at SomeName.SomeAnotherName.SomeKindOfInterceptor.Handlers.SomeKindOfInterceptorGrpcInterceptor.HandleExceptionAsync[TResponse](Task`1 response) in /Users/kolonist/dev/SomeAnotherName/src/core/SomeName.SomeAnotherName/SomeKindOfInterceptor/Handlers/SomeKindOfInterceptorGrpcInterceptor.cs:line 88
   at Program.<Main>$(String[] args) in /Users/kolonist/dev/SomeAnotherName/sandbox/SomeName.SomeAnotherName.Sandbox.Grpc/Program.cs:line 48
[16:44:42 WRN ]  ##### BEFORE Delay
[16:47:12 WRN ]  ##### END LOOP

[16:47:12 WRN ]  ##### START LOOP
[16:47:12 WRN ]  ##### BEFORE Unary call
[16:47:12 DBG ]  Starting gRPC call. Method type: 'Unary', URI: 'http://10.124.143.49:82/SomeName.SomeAnotherName.SomeService.V1.GrpcSomeServiceInterface/UnaryCall'.
[16:47:12 VRB ]  Pick started.
[16:47:12 DBG ]  Picked failure with status: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Threading.Tasks.TaskCanceledException: A task was canceled.")
[16:47:12 DBG ]  Error starting gRPC call.
[16:47:12 INF ]  Call failed with gRPC error status. Status code: 'Unavailable', Message: 'Error connecting to subchannel.'.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
[16:47:12 DBG ]  Finished gRPC call.
[16:47:12 DBG ]  gRPC call canceled.
[16:47:12 ERR ]  ##### SOME ERROR
Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Threading.Tasks.TaskCanceledException: A task was canceled.")
 ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
   --- End of inner exception stack trace ---
   at Grpc.Net.Client.Balancer.Internal.ConnectionManager.PickAsync(PickContext context, Boolean waitForReady, CancellationToken cancellationToken) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/ConnectionManager.cs:line 341
   at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/BalancerHttpHandler.cs:line 117
   at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Internal/GrpcCall.cs:line 502
   at SomeName.SomeAnotherName.SomeKindOfInterceptor.Handlers.SomeKindOfInterceptorGrpcInterceptor.HandleExceptionAsync[TResponse](Task`1 response) in /Users/kolonist/dev/mega_app/src/core/SomeName.SomeAnotherName/SomeKindOfInterceptor/Handlers/SomeKindOfInterceptorGrpcInterceptor.cs:line 88
   at Program.<Main>$(String[] args) in /Users/kolonist/dev/mega_app/sandbox/SomeName.SomeAnotherName.Sandbox.Grpc/Program.cs:line 48
[16:47:12 WRN ]  ##### BEFORE Delay
[16:49:42 WRN ]  ##### END LOOP

[16:49:42 WRN ]  ##### START LOOP
[16:49:42 WRN ]  ##### BEFORE Unary call
[16:49:42 DBG ]  Starting gRPC call. Method type: 'Unary', URI: 'http://10.124.143.49:82/SomeName.SomeAnotherName.SomeService.V1.GrpcSomeServiceInterface/UnaryCall'.
[16:49:42 VRB ]  Pick started.
[16:49:42 DBG ]  Picked failure with status: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Threading.Tasks.TaskCanceledException: A task was canceled.")
[16:49:42 DBG ]  Error starting gRPC call.
[16:49:42 INF ]  Call failed with gRPC error status. Status code: 'Unavailable', Message: 'Error connecting to subchannel.'.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
[16:49:42 DBG ]  Finished gRPC call.
[16:49:42 DBG ]  gRPC call canceled.
[16:49:42 ERR ]  ##### SOME ERROR
Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error connecting to subchannel.", DebugException="System.Threading.Tasks.TaskCanceledException: A task was canceled.")
 ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport.TryConnectAsync(ConnectContext context, Int32 attempt) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/SocketConnectivitySubchannelTransport.cs:line 186
   --- End of inner exception stack trace ---
   at Grpc.Net.Client.Balancer.Internal.ConnectionManager.PickAsync(PickContext context, Boolean waitForReady, CancellationToken cancellationToken) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/ConnectionManager.cs:line 341
   at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Balancer/Internal/BalancerHttpHandler.cs:line 117
   at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout) in /Users/kolonist/dev/grpc-dotnet-forked/src/Grpc.Net.Client/Internal/GrpcCall.cs:line 502
   at SomeName.SomeAnotherName.SomeKindOfInterceptor.Handlers.SomeKindOfInterceptorGrpcInterceptor.HandleExceptionAsync[TResponse](Task`1 response) in /Users/kolonist/dev/mega_app/src/core/SomeName.SomeAnotherName/SomeKindOfInterceptor/Handlers/SomeKindOfInterceptorGrpcInterceptor.cs:line 88
   at Program.<Main>$(String[] args) in /Users/kolonist/dev/mega_app/sandbox/SomeName.SomeAnotherName.Sandbox.Grpc/Program.cs:line 48
[16:49:42 WRN ]  ##### BEFORE Delay

... And so on...
... And so on...
... And so on...

As you can see SocketConnectivitySubchannelTransport.TryConnectAsync() never called again. No backoff or subchannels recreation ever happen.

If we have no exception in TryConnectAsync() then we expect to get the following output:

[16:57:27 WRN ]  ##### START LOOP
[16:57:27 WRN ]  ##### BEFORE Unary call
[16:57:27 DBG ]  Starting gRPC call. Method type: 'Unary', URI: 'http://10.124.143.49:82/SomeName.SomeAnotherName.SomeService.V1.GrpcSomeServiceInterface/UnaryCall'.
[16:57:27 DBG ]  Subchannel id '2-1' created with addresses: 10.124.143.49:82
[16:57:27 WRN ]  ##### SocketConnectivitySubchannelTransport Created
[16:57:27 VRB ]  Subchannel id '2-1' state changed registration '2-1-1' created.
[16:57:27 DBG ]  Channel picker updated.
[16:57:27 VRB ]  Subchannel id '2-1' connection requested.
[16:57:27 DBG ]  Subchannel id '2-1' state changed to Connecting. Detail: 'Connection requested.'.
[16:57:27 VRB ]  Subchannel id '2-1' executing state changed registration '2-1-1'.
[16:57:27 VRB ]  Processing subchannel id '2-1' state changed to Connecting. Detail: 'Connection requested.'.
[16:57:27 DBG ]  Channel state updated to Connecting.
[16:57:27 VRB ]  Subchannel id '2-1' connecting to transport.
[16:57:27 VRB ]  Pick started.
[16:57:27 WRN ]  ##### TryConnectAsync context.CancellationToken.IsCancellationRequested = False
[16:57:27 DBG ]  Picked queued.
[16:57:27 VRB ]  Waiting for a new picker.
[16:57:27 VRB ]  Subchannel id '2-1' connecting socket to 10.124.143.49:82.
[16:57:27 DBG ]  Subchannel id '2-1' connected to socket 10.124.143.49:82.
[16:57:27 DBG ]  Subchannel id '2-1' state changed to Ready. Detail: 'Successfully connected to socket.'.
[16:57:27 VRB ]  Subchannel id '2-1' executing state changed registration '2-1-1'.
[16:57:27 VRB ]  Processing subchannel id '2-1' state changed to Ready. Detail: 'Successfully connected to socket.'.
[16:57:27 DBG ]  Channel state updated to Ready.
[16:57:27 DBG ]  Channel picker updated.
[16:57:27 VRB ]  Pick started.
[16:57:27 DBG ]  Successfully picked subchannel id '2-1' with address 10.124.143.49:82. Transport status: InitialSocket
[16:57:27 VRB ]  Sending request http://10.124.143.49:82/SomeName.SomeAnotherName.SomeService.V1.GrpcSomeServiceInterface/UnaryCall.
[16:57:27 VRB ]  Starting connect callback for 10.124.143.49:82.
[16:57:27 VRB ]  Subchannel id '2-1' creating stream for 10.124.143.49:82.
[16:57:27 VRB ]  Subchannel id '2-1' checking socket 10.124.143.49:82.
[16:57:27 VRB ]  Subchannel id '2-1' created stream for 10.124.143.49:82 with 0 buffered bytes. Transport has 1 active streams.
[16:57:27 DBG ]  Sending message.
[16:57:27 VRB ]  Serialized 'SomeName.SomeAnotherName.SomeService.V1.GrpcSomeRequest' to 16 byte message.
[16:57:27 VRB ]  Message sent.
[16:57:27 VRB ]  Response headers received.
[16:57:27 DBG ]  Reading message.
[16:57:27 VRB ]  Deserializing 16 byte message to 'SomeName.SomeAnotherName.SomeService.V1.GrpcSomeResponse'.
[16:57:27 VRB ]  Received message.
[16:57:27 DBG ]  Finished gRPC call.
[16:57:27 WRN ]  ##### Response: Random message
[16:57:27 WRN ]  ##### BEFORE Delay
[16:58:42 VRB ]  Subchannel id '2-1' disposing stream for 10.124.143.49:82. Transport has 0 active streams.
[16:58:42 DBG ]  Subchannel id '2-1' state changed to Idle. Detail: 'Disconnected.'.
[16:58:42 VRB ]  Subchannel id '2-1' executing state changed registration '2-1-1'.
[16:58:42 VRB ]  Processing subchannel id '2-1' state changed to Idle. Detail: 'Disconnected.'.
[16:58:42 DBG ]  Channel state updated to Idle.
[16:58:42 DBG ]  Channel picker updated.
[16:59:57 WRN ]  ##### END LOOP

[16:59:57 WRN ]  ##### START LOOP
[16:59:57 WRN ]  ##### BEFORE Unary call
[16:59:57 DBG ]  Starting gRPC call. Method type: 'Unary', URI: 'http://10.124.143.49:82/SomeName.SomeAnotherName.SomeService.V1.GrpcSomeServiceInterface/UnaryCall'.
[16:59:57 VRB ]  Pick started.
[16:59:57 VRB ]  Subchannel id '2-1' connection requested.
[16:59:57 DBG ]  Subchannel id '2-1' state changed to Connecting. Detail: 'Connection requested.'.
[16:59:57 VRB ]  Subchannel id '2-1' executing state changed registration '2-1-1'.
[16:59:57 VRB ]  Processing subchannel id '2-1' state changed to Connecting. Detail: 'Connection requested.'.
[16:59:57 DBG ]  Channel state updated to Connecting.
[16:59:57 DBG ]  Channel picker updated.
[16:59:57 DBG ]  Picked subchannel id '2-1' doesn't have a current address.
[16:59:57 VRB ]  Pick started.
[16:59:57 DBG ]  Picked queued.
[16:59:57 VRB ]  Waiting for a new picker.
[16:59:57 VRB ]  Subchannel id '2-1' connecting to transport.
[16:59:57 WRN ]  ##### TryConnectAsync context.CancellationToken.IsCancellationRequested = False
[16:59:57 VRB ]  Subchannel id '2-1' connecting socket to 10.124.143.49:82.
[16:59:58 DBG ]  Subchannel id '2-1' connected to socket 10.124.143.49:82.
[16:59:58 DBG ]  Subchannel id '2-1' state changed to Ready. Detail: 'Successfully connected to socket.'.
[16:59:58 VRB ]  Subchannel id '2-1' executing state changed registration '2-1-1'.
[16:59:58 VRB ]  Processing subchannel id '2-1' state changed to Ready. Detail: 'Successfully connected to socket.'.
[16:59:58 DBG ]  Channel state updated to Ready.
[16:59:58 DBG ]  Channel picker updated.
[16:59:58 VRB ]  Pick started.
[16:59:58 DBG ]  Successfully picked subchannel id '2-1' with address 10.124.143.49:82. Transport status: InitialSocket
[16:59:58 VRB ]  Sending request http://10.124.143.49:82/SomeName.SomeAnotherName.SomeService.V1.GrpcSomeServiceInterface/UnaryCall.
[16:59:58 VRB ]  Starting connect callback for 10.124.143.49:82.
[16:59:58 VRB ]  Subchannel id '2-1' creating stream for 10.124.143.49:82.
[16:59:58 VRB ]  Subchannel id '2-1' checking socket 10.124.143.49:82.
[16:59:58 VRB ]  Subchannel id '2-1' created stream for 10.124.143.49:82 with 0 buffered bytes. Transport has 1 active streams.
[16:59:58 DBG ]  Sending message.
[16:59:58 VRB ]  Serialized 'SomeName.SomeAnotherName.SomeService.V1.GrpcSomeRequest' to 16 byte message.
[16:59:58 VRB ]  Message sent.
[16:59:58 VRB ]  Response headers received.
[16:59:58 DBG ]  Reading message.
[16:59:58 VRB ]  Deserializing 16 byte message to 'SomeName.SomeAnotherName.SomeService.V1.GrpcSomeResponse'.
[16:59:58 VRB ]  Received message.
[16:59:58 DBG ]  Finished gRPC call.
[16:59:58 WRN ]  ##### Response: Random message
[16:59:58 WRN ]  ##### BEFORE Delay
[17:01:13 VRB ]  Subchannel id '2-1' disposing stream for 10.124.143.49:82. Transport has 0 active streams.
[17:01:13 DBG ]  Subchannel id '2-1' state changed to Idle. Detail: 'Disconnected.'.
[17:01:13 VRB ]  Subchannel id '2-1' executing state changed registration '2-1-1'.
[17:01:13 VRB ]  Processing subchannel id '2-1' state changed to Idle. Detail: 'Disconnected.'.
[17:01:13 DBG ]  Channel state updated to Idle.
[17:01:13 DBG ]  Channel picker updated.
[17:02:28 WRN ]  ##### END LOOP

[17:02:28 WRN ]  ##### START LOOP
[17:02:28 WRN ]  ##### BEFORE Unary call
[17:02:28 DBG ]  Starting gRPC call. Method type: 'Unary', URI: 'http://10.124.143.49:82/SomeName.SomeAnotherName.SomeService.V1.GrpcSomeServiceInterface/UnaryCall'.
[17:02:28 VRB ]  Pick started.
[17:02:28 VRB ]  Subchannel id '2-1' connection requested.
[17:02:28 DBG ]  Subchannel id '2-1' state changed to Connecting. Detail: 'Connection requested.'.
[17:02:28 VRB ]  Subchannel id '2-1' executing state changed registration '2-1-1'.
[17:02:28 VRB ]  Processing subchannel id '2-1' state changed to Connecting. Detail: 'Connection requested.'.
[17:02:28 DBG ]  Channel state updated to Connecting.
[17:02:28 DBG ]  Channel picker updated.
[17:02:28 DBG ]  Picked subchannel id '2-1' doesn't have a current address.
[17:02:28 VRB ]  Pick started.
[17:02:28 DBG ]  Picked queued.
[17:02:28 VRB ]  Waiting for a new picker.
[17:02:28 VRB ]  Subchannel id '2-1' connecting to transport.
[17:02:28 WRN ]  ##### TryConnectAsync context.CancellationToken.IsCancellationRequested = False
[17:02:28 VRB ]  Subchannel id '2-1' connecting socket to 10.124.143.49:82.
[17:02:28 DBG ]  Subchannel id '2-1' connected to socket 10.124.143.49:82.
[17:02:28 DBG ]  Subchannel id '2-1' state changed to Ready. Detail: 'Successfully connected to socket.'.
[17:02:28 VRB ]  Subchannel id '2-1' executing state changed registration '2-1-1'.
[17:02:28 VRB ]  Processing subchannel id '2-1' state changed to Ready. Detail: 'Successfully connected to socket.'.
[17:02:28 DBG ]  Channel state updated to Ready.
[17:02:28 DBG ]  Channel picker updated.
[17:02:28 VRB ]  Pick started.
[17:02:28 DBG ]  Successfully picked subchannel id '2-1' with address 10.124.143.49:82. Transport status: InitialSocket
[17:02:28 VRB ]  Sending request http://10.124.143.49:82/SomeName.SomeAnotherName.SomeService.V1.GrpcSomeServiceInterface/UnaryCall.
[17:02:28 VRB ]  Starting connect callback for 10.124.143.49:82.
[17:02:28 VRB ]  Subchannel id '2-1' creating stream for 10.124.143.49:82.
[17:02:28 VRB ]  Subchannel id '2-1' checking socket 10.124.143.49:82.
[17:02:28 VRB ]  Subchannel id '2-1' created stream for 10.124.143.49:82 with 0 buffered bytes. Transport has 1 active streams.
[17:02:28 DBG ]  Sending message.
[17:02:28 VRB ]  Serialized 'SomeName.SomeAnotherName.SomeService.V1.GrpcSomeRequest' to 16 byte message.
[17:02:28 VRB ]  Message sent.
[17:02:28 VRB ]  Response headers received.
[17:02:28 DBG ]  Reading message.
[17:02:28 VRB ]  Deserializing 16 byte message to 'SomeName.SomeAnotherName.SomeService.V1.GrpcSomeResponse'.
[17:02:28 VRB ]  Received message.
[17:02:28 DBG ]  Finished gRPC call.
[17:02:28 WRN ]  ##### Response: Random message
[17:02:28 WRN ]  ##### BEFORE Delay

... And so on...
... And so on...
... And so on...

We found no workaround yet, so It would be great if you take a look at this issue.

Best regards, Alexander

kolonist avatar Apr 27 '24 14:04 kolonist

Thanks for the detailed report. I can reproduce the error by adding the code you suggested. However, that doesn't answer the question of how the subchannel got in that state. I haven't reproduced it through normal usage.

I have a theory about what is causing this. PR: https://github.com/grpc/grpc-dotnet/pull/2422

JamesNK avatar Apr 29 '24 02:04 JamesNK

Hi, @JamesNK

Thank you for such quick solution. We've tested it in our environment and the problem is no longer reproduced.

Please could you tell us when are you gonna to build release version of grpc-dotnet with this fix?

kolonist avatar May 06 '24 11:05 kolonist

https://www.nuget.org/packages/Grpc.Net.Client/2.63.0-pre1 is on NuGet with this change. A non-preview version will come in about 2 weeks.

JamesNK avatar May 08 '24 23:05 JamesNK