grpc-dotnet icon indicating copy to clipboard operation
grpc-dotnet copied to clipboard

Benchmark scenario "dotnet_protobuf_async_unary_ping_pong_1MB" is highly flaky

Open jtattermusch opened this issue 3 years ago • 21 comments

Also see internal bug b/253283712.

dotnet_protobuf_async_unary_ping_pong_1MB is run with grpc_e2e_performance_gke] on both 30-core machine and 8-core machines

Since https://source.cloud.google.com/results/invocations/38c6078b-4413-488b-81a9-22a71226dcc8/targets (the job was broken before) there are multiple failures of this test.

It failed typically with driver process hang and the loadtest eventually timed out.

jtattermusch avatar Oct 26 '22 10:10 jtattermusch

Example failed run: https://source.cloud.google.com/results/invocations/f6d2453b-c74c-476e-8c6b-c2b6531ae33b/targets;collapsed=/github%2Fgrpc%2Frunner%2Fworkers-c2-30core-ci/tests

successful run:

D1018 08:11:33.292954351       9 driver.cc:521]              Client 0 gets 1 channels
I1018 08:11:33.879114916       9 driver.cc:544]              Initiating
I1018 08:11:33.885091246       9 driver.cc:567]              Warming up
I1018 08:11:38.885290136       9 driver.cc:573]              Starting
I1018 08:11:38.891221564       9 driver.cc:607]              Running
I1018 08:12:08.885337794       9 driver.cc:239]              Finishing clients
I1018 08:12:08.885679444       9 driver.cc:256]              Receiving final status from clients
I1018 08:12:08.888682203       9 driver.cc:262]              Received final status from client 0
I1018 08:12:08.907244870       9 driver.cc:286]              Shutdown clients
I1018 08:12:08.907493879       9 driver.cc:305]              Finishing servers
I1018 08:12:08.907720066       9 driver.cc:321]              Receiving final status from servers
I1018 08:12:08.909088613       9 driver.cc:327]              Received final status from server 0
I1018 08:12:08.945486815       9 driver.cc:341]              Shutdown servers
D1018 08:12:08.950043841       9 init.cc:190]                grpc_shutdown starts clean-up now
I1018 08:12:08.951097124       9 report.cc:81]               QPS: 136.3
I1018 08:12:08.953008313       9 report.cc:91]               QPS: 136.3 (inf/server core)
I1018 08:12:08.953115592       9 report.cc:96]               Latencies (50/90/95/99/99.9%-ile): 6996.1/8588.7/9543.6/13016.8/18177.1 us
I1018 08:12:08.953197476       9 report.cc:106]              Server system time: 115.56%
I1018 08:12:08.953277983       9 report.cc:108]              Server user time:   199.40%
I1018 08:12:08.953338102       9 report.cc:110]              Client system time: 92.14%
I1018 08:12:08.953390366       9 report.cc:112]              Client user time:   143.64%
I1018 08:12:08.953431814       9 report.cc:117]              Server CPU usage: 0.00%
I1018 08:12:08.953489314       9 report.cc:122]              Client Polls per Request: 0.00
I1018 08:12:08.953541262       9 report.cc:124]              Server Polls per Request: 0.00
I1018 08:12:08.953584860       9 report.cc:129]              Server Queries/CPU-sec: 43.25
I1018 08:12:08.953642403       9 report.cc:131]              Client Queries/CPU-sec: 57.81
I1018 08:12:08.953730476       9 test_config.cc:175]         TestEnvironment ends

Failed run:

D1012 12:12:37.623225112       9 driver.cc:521]              Client 0 gets 1 channels
I1012 12:12:38.195802727       9 driver.cc:544]              Initiating
I1012 12:12:38.201175484       9 driver.cc:567]              Warming up
I1012 12:12:43.201442879       9 driver.cc:573]              Starting
I1012 12:12:43.206375476       9 driver.cc:607]              Running
I1012 12:13:13.201356345       9 driver.cc:239]              Finishing clients
I1012 12:13:13.201622844       9 driver.cc:256]              Receiving final status from clients
I1012 12:13:13.204457381       9 driver.cc:262]              Received final status from client 0
*** SIGTERM received at time=1665577656 on cpu 0 ***

From the logs it basically looks like the client is busy and doesn't react to driver's requests?

jtattermusch avatar Oct 26 '22 10:10 jtattermusch

client worker log:

dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
      Hosting starting
warn: Microsoft.AspNetCore.Server.Kestrel[0]
      Overriding address(es) 'http://+:80'. Binding to endpoints defined via IConfiguration and/or UseKestrel() instead.
info: Microsoft.Hosting.Lifetime[14]
      Now listening on: http://[::]:10000
dbug: Microsoft.AspNetCore.Hosting.Diagnostics[13]
      Loaded hosting startup assembly QpsWorker
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /src/workspace
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[39]
      Connection id "0HMLC9C6P3E8B" accepted.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[1]
      Connection id "0HMLC9C6P3E8B" started.
info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/2 POST http://10.16.73.5:10000/grpc.testing.WorkerService/RunClient application/grpc -
dbug: Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware[0]
      Wildcard detected, all requests with hosts will be allowed.
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
      3 candidate(s) found for the request path '/grpc.testing.WorkerService/RunClient'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - /grpc.testing.WorkerService/RunClient' with route pattern '/grpc.testing.WorkerService/RunClient' is valid for the request path '/grpc.testing.WorkerService/RunClient'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented method for grpc.testing.WorkerService' with route pattern 'grpc.testing.WorkerService/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/RunClient'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented service' with route pattern '{unimplementedService}/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/RunClient'
dbug: Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware[1]
      Request matched endpoint 'gRPC - /grpc.testing.WorkerService/RunClient'
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'gRPC - /grpc.testing.WorkerService/RunClient'
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
dbug: Microsoft.AspNetCore.Server.Kestrel[25]
      Connection id "0HMLC9C6P3E8B", Request id "0HMLC9C6P3E8B:00000001": started reading request body.
dbug: QpsWorker.Infrastructure.ClientRunner[0]
      ClientConfig: { "serverTargets": [ "10.16.28.21:10010" ], "clientType": "ASYNC_CLIENT", "securityParams": { "useTestCa": true, "serverHostOverride": "foo.test.google.fr" }, "outstandingRpcsPerChannel": 1, "clientChannels": 1, "asyncClientThreads": 1, "loadParams": { "closedLoop": { } }, "payloadConfig": { "simpleParams": { "reqSize": 1048576, "respSize": 1048576 } }, "histogramParams": { "resolution": 0.01, "maxPossible": 60000000000 }, "channelArgs": [ { "name": "grpc.optimization_target", "strValue": "latency" } ] }
warn: QpsWorker.Infrastructure.ClientRunner[0]
      ClientConfig.AsyncClientThreads is not supported for C#. Ignoring the value
info: QpsWorker.Infrastructure.ClientRunner[0]
      [ClientRunnerImpl.GetStats] GC collection counts: gen0 0, gen1 0, gen2 0, (histogram reset count:0, seconds since reset: 0.1129903)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
info: QpsWorker.Infrastructure.ClientRunner[0]
      [ClientRunnerImpl.GetStats] GC collection counts: gen0 0, gen1 0, gen2 0, (histogram reset count:1, seconds since reset: 0.1413)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
info: QpsWorker.Infrastructure.ClientRunner[0]
      [ClientRunnerImpl.GetStats] GC collection counts: gen0 7, gen1 7, gen2 7, (histogram reset count:2, seconds since reset: 5.0049085)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
info: QpsWorker.Infrastructure.ClientRunner[0]
      [ClientRunnerImpl.GetStats] GC collection counts: gen0 26, gen1 26, gen2 26, (histogram reset count:3, seconds since reset: 29.9982098)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMLC9C6P3E8B", Request id "0HMLC9C6P3E8B:00000001": done reading request body.
info: QpsWorker.Services.WorkerServiceImpl[0]
      Exiting RunClient.
info: Microsoft.Hosting.Lifetime[0]
      Application is shutting down...
dbug: Microsoft.Extensions.Hosting.Internal.Host[3]
      Hosting stopping
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[36]
      Connection id "0HMLC9C6P3E8B" is closing.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[6]
      Connection id "0HMLC9C6P3E8B" received FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[48]
      Connection id "0HMLC9C6P3E8B" is closed. The last processed stream ID was 1.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
      Connection id "0HMLC9C6P3E8B" sending FIN because: "The client closed the connection."

server worker log:

dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
      Hosting starting
warn: Microsoft.AspNetCore.Server.Kestrel[0]
      Overriding address(es) 'http://+:80'. Binding to endpoints defined via IConfiguration and/or UseKestrel() instead.
info: Microsoft.Hosting.Lifetime[14]
      Now listening on: http://[::]:10000
dbug: Microsoft.AspNetCore.Hosting.Diagnostics[13]
      Loaded hosting startup assembly QpsWorker
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /src/workspace
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[39]
      Connection id "0HMLC9C6ISOVV" accepted.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[1]
      Connection id "0HMLC9C6ISOVV" started.
info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/2 POST http://10.16.28.21:10000/grpc.testing.WorkerService/RunServer application/grpc -
dbug: Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware[0]
      Wildcard detected, all requests with hosts will be allowed.
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
      3 candidate(s) found for the request path '/grpc.testing.WorkerService/RunServer'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - /grpc.testing.WorkerService/RunServer' with route pattern '/grpc.testing.WorkerService/RunServer' is valid for the request path '/grpc.testing.WorkerService/RunServer'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented method for grpc.testing.WorkerService' with route pattern 'grpc.testing.WorkerService/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/RunServer'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented service' with route pattern '{unimplementedService}/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/RunServer'
dbug: Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware[1]
      Request matched endpoint 'gRPC - /grpc.testing.WorkerService/RunServer'
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'gRPC - /grpc.testing.WorkerService/RunServer'
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
dbug: Microsoft.AspNetCore.Server.Kestrel[25]
      Connection id "0HMLC9C6ISOVV", Request id "0HMLC9C6ISOVV:00000001": started reading request body.
dbug: QpsWorker.Infrastructure.ServerRunner[0]
      ServerConfig: { "serverType": "ASYNC_SERVER", "securityParams": { "useTestCa": true, "serverHostOverride": "foo.test.google.fr" }, "port": 10010, "channelArgs": [ { "name": "grpc.optimization_target", "strValue": "latency" } ] }
info: QpsWorker.Infrastructure.ServerRunner[0]
      Server started.
info: QpsWorker.Infrastructure.ServerRunner[0]
      [ServerRunner.GetStats] GC collection counts: gen0 0, gen1 0, gen2 0, (seconds since last reset 0.0051085)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
info: QpsWorker.Infrastructure.ServerRunner[0]
      [ServerRunner.GetStats] GC collection counts: gen0 20, gen1 20, gen2 20, (seconds since last reset 5.6174403)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
info: Microsoft.Hosting.Lifetime[0]
      Application is shutting down...
dbug: Microsoft.Extensions.Hosting.Internal.Host[3]
      Hosting stopping
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[36]
      Connection id "0HMLC9C6ISOVV" is closing.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[6]
      Connection id "0HMLC9C6ISOVV" received FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
      Connection id "0HMLC9C6ISOVV" sending FIN because: "The client closed the connection."
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[48]
      Connection id "0HMLC9C6ISOVV" is closed. The last processed stream ID was 1.
info: Grpc.AspNetCore.Server.ServerCallHandler[14]
      Error reading message.
      System.IO.IOException: The request stream was aborted.
       ---> Microsoft.AspNetCore.Connections.ConnectionAbortedException: The HTTP/2 connection faulted.
         --- End of inner exception stack trace ---
         at System.IO.Pipelines.Pipe.GetReadAsyncResult()
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http2.Http2MessageBody.ReadAsync(CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
         at Grpc.AspNetCore.Server.Internal.PipeExtensions.ReadStreamMessageAsync[T](PipeReader input, HttpContextServerCallContext serverCallContext, Func`2 deserializer, CancellationToken cancellationToken) in /pre/src/Grpc.AspNetCore.Server/Internal/PipeExtensions.cs:line 298
info: QpsWorker.Services.WorkerServiceImpl[0]
      Exiting RunServer.

jtattermusch avatar Oct 26 '22 10:10 jtattermusch

For comparison, this is the log for serverside worker for a successful run: One thing that stands out is that the server doesn't seem to receive a Mark at the end of the benchmark run "seconds since last reset 30" which should happen and the end of the benchmark run.

dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
      Hosting starting
warn: Microsoft.AspNetCore.Server.Kestrel[0]
      Overriding address(es) 'http://+:80'. Binding to endpoints defined via IConfiguration and/or UseKestrel() instead.
info: Microsoft.Hosting.Lifetime[14]
      Now listening on: http://[::]:10000
dbug: Microsoft.AspNetCore.Hosting.Diagnostics[13]
      Loaded hosting startup assembly QpsWorker
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /src/workspace
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[39]
      Connection id "0HMLAVDVG0GK4" accepted.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[1]
      Connection id "0HMLAVDVG0GK4" started.
info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/2 POST http://10.16.76.179:10000/grpc.testing.WorkerService/RunServer application/grpc -
dbug: Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware[0]
      Wildcard detected, all requests with hosts will be allowed.
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
      3 candidate(s) found for the request path '/grpc.testing.WorkerService/RunServer'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - /grpc.testing.WorkerService/RunServer' with route pattern '/grpc.testing.WorkerService/RunServer' is valid for the request path '/grpc.testing.WorkerService/RunServer'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented method for grpc.testing.WorkerService' with route pattern 'grpc.testing.WorkerService/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/RunServer'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented service' with route pattern '{unimplementedService}/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/RunServer'
dbug: Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware[1]
      Request matched endpoint 'gRPC - /grpc.testing.WorkerService/RunServer'
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'gRPC - /grpc.testing.WorkerService/RunServer'
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
dbug: Microsoft.AspNetCore.Server.Kestrel[25]
      Connection id "0HMLAVDVG0GK4", Request id "0HMLAVDVG0GK4:00000001": started reading request body.
dbug: QpsWorker.Infrastructure.ServerRunner[0]
      ServerConfig: { "serverType": "ASYNC_SERVER", "securityParams": { "useTestCa": true, "serverHostOverride": "foo.test.google.fr" }, "port": 10010, "channelArgs": [ { "name": "grpc.optimization_target", "strValue": "latency" } ] }
info: QpsWorker.Infrastructure.ServerRunner[0]
      Server started.
info: QpsWorker.Infrastructure.ServerRunner[0]
      [ServerRunner.GetStats] GC collection counts: gen0 0, gen1 0, gen2 0, (seconds since last reset 0.0057898)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
info: QpsWorker.Infrastructure.ServerRunner[0]
      [ServerRunner.GetStats] GC collection counts: gen0 19, gen1 19, gen2 19, (seconds since last reset 5.6876219)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
info: QpsWorker.Infrastructure.ServerRunner[0]
      [ServerRunner.GetStats] GC collection counts: gen0 155, gen1 155, gen2 155, (seconds since last reset 30.0118665)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMLAVDVG0GK4", Request id "0HMLAVDVG0GK4:00000001": done reading request body.
info: QpsWorker.Services.WorkerServiceImpl[0]
      Exiting RunServer.
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'gRPC - /grpc.testing.WorkerService/RunServer'
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[6]
      Connection id "0HMLAVDVG0GK4" received FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
      Connection id "0HMLAVDVG0GK4" sending FIN because: "The client closed the connection."
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[48]
      Connection id "0HMLAVDVG0GK4" is closed. The last processed stream ID was 1.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[39]
      Connection id "0HMLAVDVG0GK6" accepted.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[1]
      Connection id "0HMLAVDVG0GK6" started.
info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/2 POST http://10.16.76.179:10000/grpc.testing.WorkerService/RunServer application/grpc - - 200 - application/grpc 36274.3843ms
info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/2 POST http://10.16.76.179:10000/grpc.testing.WorkerService/QuitWorker application/grpc -
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[62]
      The connection queue processing loop for 0HMLAVDVG0GK4 completed.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[2]
      Connection id "0HMLAVDVG0GK4" stopped.
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
      3 candidate(s) found for the request path '/grpc.testing.WorkerService/QuitWorker'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - /grpc.testing.WorkerService/QuitWorker' with route pattern '/grpc.testing.WorkerService/QuitWorker' is valid for the request path '/grpc.testing.WorkerService/QuitWorker'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented method for grpc.testing.WorkerService' with route pattern 'grpc.testing.WorkerService/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/QuitWorker'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented service' with route pattern '{unimplementedService}/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/QuitWorker'
dbug: Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware[1]
      Request matched endpoint 'gRPC - /grpc.testing.WorkerService/QuitWorker'
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'gRPC - /grpc.testing.WorkerService/QuitWorker'
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
dbug: Microsoft.AspNetCore.Server.Kestrel[25]
      Connection id "0HMLAVDVG0GK6", Request id "0HMLAVDVG0GK6:00000001": started reading request body.
dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMLAVDVG0GK6", Request id "0HMLAVDVG0GK6:00000001": done reading request body.
info: Microsoft.Hosting.Lifetime[0]
      Application is shutting down...
dbug: Microsoft.Extensions.Hosting.Internal.Host[3]
      Hosting stopping
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'gRPC - /grpc.testing.WorkerService/QuitWorker'
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[36]
      Connection id "0HMLAVDVG0GK6" is closing.
info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/2 POST http://10.16.76.179:10000/grpc.testing.WorkerService/QuitWorker application/grpc - - 200 - application/grpc 3.0134ms
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[48]
      Connection id "0HMLAVDVG0GK6" is closed. The last processed stream ID was 1.
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[62]
      The connection queue processing loop for 0HMLAVDVG0GK6 completed.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[2]
      Connection id "0HMLAVDVG0GK6" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
      Connection id "0HMLAVDVG0GK6" sending FIN because: "The Socket transport's send loop completed gracefully."
dbug: Microsoft.Extensions.Hosting.Internal.Host[4]
      Hosting stopped

jtattermusch avatar Oct 26 '22 10:10 jtattermusch

Unfortunately the default dotnet logs don't have timestamps in them which makes them less useful - it would be worthwhile to include the timestamps in the logs by default (C core logs timestamps by default).

jtattermusch avatar Oct 26 '22 10:10 jtattermusch

@JamesNK any ideas?

jtattermusch avatar Oct 26 '22 10:10 jtattermusch

logs of clientside worker in a successful run for comparison:

dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
      Hosting starting
warn: Microsoft.AspNetCore.Server.Kestrel[0]
      Overriding address(es) 'http://+:80'. Binding to endpoints defined via IConfiguration and/or UseKestrel() instead.
info: Microsoft.Hosting.Lifetime[14]
      Now listening on: http://[::]:10000
dbug: Microsoft.AspNetCore.Hosting.Diagnostics[13]
      Loaded hosting startup assembly QpsWorker
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /src/workspace
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[39]
      Connection id "0HMLAVDVMQ5HN" accepted.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[1]
      Connection id "0HMLAVDVMQ5HN" started.
info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/2 POST http://10.16.73.177:10000/grpc.testing.WorkerService/RunClient application/grpc -
dbug: Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware[0]
      Wildcard detected, all requests with hosts will be allowed.
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
      3 candidate(s) found for the request path '/grpc.testing.WorkerService/RunClient'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - /grpc.testing.WorkerService/RunClient' with route pattern '/grpc.testing.WorkerService/RunClient' is valid for the request path '/grpc.testing.WorkerService/RunClient'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented method for grpc.testing.WorkerService' with route pattern 'grpc.testing.WorkerService/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/RunClient'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented service' with route pattern '{unimplementedService}/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/RunClient'
dbug: Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware[1]
      Request matched endpoint 'gRPC - /grpc.testing.WorkerService/RunClient'
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'gRPC - /grpc.testing.WorkerService/RunClient'
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
dbug: Microsoft.AspNetCore.Server.Kestrel[25]
      Connection id "0HMLAVDVMQ5HN", Request id "0HMLAVDVMQ5HN:00000001": started reading request body.
dbug: QpsWorker.Infrastructure.ClientRunner[0]
      ClientConfig: { "serverTargets": [ "10.16.76.179:10010" ], "clientType": "ASYNC_CLIENT", "securityParams": { "useTestCa": true, "serverHostOverride": "foo.test.google.fr" }, "outstandingRpcsPerChannel": 1, "clientChannels": 1, "asyncClientThreads": 1, "loadParams": { "closedLoop": { } }, "payloadConfig": { "simpleParams": { "reqSize": 1048576, "respSize": 1048576 } }, "histogramParams": { "resolution": 0.01, "maxPossible": 60000000000 }, "channelArgs": [ { "name": "grpc.optimization_target", "strValue": "latency" } ] }
warn: QpsWorker.Infrastructure.ClientRunner[0]
      ClientConfig.AsyncClientThreads is not supported for C#. Ignoring the value
info: QpsWorker.Infrastructure.ClientRunner[0]
      [ClientRunnerImpl.GetStats] GC collection counts: gen0 0, gen1 0, gen2 0, (histogram reset count:0, seconds since reset: 0.1246714)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
info: QpsWorker.Infrastructure.ClientRunner[0]
      [ClientRunnerImpl.GetStats] GC collection counts: gen0 0, gen1 0, gen2 0, (histogram reset count:1, seconds since reset: 0.1582041)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
info: QpsWorker.Infrastructure.ClientRunner[0]
      [ClientRunnerImpl.GetStats] GC collection counts: gen0 7, gen1 7, gen2 7, (histogram reset count:2, seconds since reset: 5.004849)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
info: QpsWorker.Infrastructure.ClientRunner[0]
      [ClientRunnerImpl.GetStats] GC collection counts: gen0 55, gen1 55, gen2 55, (histogram reset count:3, seconds since reset: 29.9983231)
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMLAVDVMQ5HN", Request id "0HMLAVDVMQ5HN:00000001": done reading request body.
info: QpsWorker.Services.WorkerServiceImpl[0]
      Exiting RunClient.
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'gRPC - /grpc.testing.WorkerService/RunClient'
info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/2 POST http://10.16.73.177:10000/grpc.testing.WorkerService/RunClient application/grpc - - 200 - application/grpc 35518.7021ms
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[6]
      Connection id "0HMLAVDVMQ5HN" received FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[62]
      The connection queue processing loop for 0HMLAVDVMQ5HN completed.
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[48]
      Connection id "0HMLAVDVMQ5HN" is closed. The last processed stream ID was 1.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
      Connection id "0HMLAVDVMQ5HN" sending FIN because: "The client closed the connection."
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[2]
      Connection id "0HMLAVDVMQ5HN" stopped.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[39]
      Connection id "0HMLAVDVMQ5HO" accepted.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[1]
      Connection id "0HMLAVDVMQ5HO" started.
info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
      Request starting HTTP/2 POST http://10.16.73.177:10000/grpc.testing.WorkerService/QuitWorker application/grpc -
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
      3 candidate(s) found for the request path '/grpc.testing.WorkerService/QuitWorker'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - /grpc.testing.WorkerService/QuitWorker' with route pattern '/grpc.testing.WorkerService/QuitWorker' is valid for the request path '/grpc.testing.WorkerService/QuitWorker'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented method for grpc.testing.WorkerService' with route pattern 'grpc.testing.WorkerService/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/QuitWorker'
dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]
      Endpoint 'gRPC - Unimplemented service' with route pattern '{unimplementedService}/{unimplementedMethod}' is valid for the request path '/grpc.testing.WorkerService/QuitWorker'
dbug: Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware[1]
      Request matched endpoint 'gRPC - /grpc.testing.WorkerService/QuitWorker'
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[0]
      Executing endpoint 'gRPC - /grpc.testing.WorkerService/QuitWorker'
dbug: Grpc.AspNetCore.Server.ServerCallHandler[10]
      Reading message.
dbug: Microsoft.AspNetCore.Server.Kestrel[25]
      Connection id "0HMLAVDVMQ5HO", Request id "0HMLAVDVMQ5HO:00000001": started reading request body.
dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMLAVDVMQ5HO", Request id "0HMLAVDVMQ5HO:00000001": done reading request body.
info: Microsoft.Hosting.Lifetime[0]
      Application is shutting down...
dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'gRPC - /grpc.testing.WorkerService/QuitWorker'
info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/2 POST http://10.16.73.177:10000/grpc.testing.WorkerService/QuitWorker application/grpc - - 200 - application/grpc 8.5102ms
dbug: Microsoft.Extensions.Hosting.Internal.Host[3]
      Hosting stopping
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[6]
      Connection id "0HMLAVDVMQ5HO" received FIN.
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[48]
      Connection id "0HMLAVDVMQ5HO" is closed. The last processed stream ID was 1.
dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
      Connection id "0HMLAVDVMQ5HO" sending FIN because: "The client closed the connection."
dbug: Microsoft.AspNetCore.Server.Kestrel.Http2[62]
      The connection queue processing loop for 0HMLAVDVMQ5HO completed.
dbug: Microsoft.AspNetCore.Server.Kestrel.Connections[2]
      Connection id "0HMLAVDVMQ5HO" stopped.
dbug: Microsoft.Extensions.Hosting.Internal.Host[4]
      Hosting stopped

jtattermusch avatar Oct 26 '22 10:10 jtattermusch

AFAICT, when the flake happens, the RunClient request handling on the client doesn't finish for some reason: failing run (client)

dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMMT3ENE9BNB", Request id "0HMMT3ENE9BNB:00000001": done reading request body.
info: QpsWorker.Services.WorkerServiceImpl[0]
      Exiting RunClient.

successful run (client)

dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMMT3BMQV01D", Request id "0HMMT3BMQV01D:00000001": done reading request body.
info: QpsWorker.Services.WorkerServiceImpl[0]
      Exiting RunClient.
info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'gRPC - /grpc.testing.WorkerService/RunClient'
info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/2 POST http://10.16.21.2:10000/grpc.testing.WorkerService/RunClient application/grpc - - 200 - application/grpc 35467.9014ms

Thanks to that, the driver never sees the RunClient stream being closed and IMHO gets stuck here: https://github.com/grpc/grpc/blob/feda5031fa300388f72ad4e3d4e4510771df2e19/test/cpp/qps/driver.cc#L276

Since the client still manages to print "Exiting RunClient" here, perhaps it gets stuck here (just before returning from the handler method): https://github.com/grpc/grpc-dotnet/blob/456098bcfeb9c796c19ea023b3ab3703181f1089/perf/benchmarkapps/QpsWorker/Services/WorkerServiceImpl.cs#L99

jtattermusch avatar Dec 13 '22 15:12 jtattermusch

I did some experimenting and it seems that that the await client.UnaryCallAsync(...) suddenly gets stuck for no obvious reason (after completing ~1400 calls just fine) and that prevents the clientRunner.StopAsync() from ever finishing.

https://github.com/grpc/grpc-dotnet/blob/1a9cf64a720e3423d900df266dfe10a315917b52/perf/benchmarkapps/QpsWorker/Infrastructure/ClientRunner.cs#L191

https://github.com/grpc/grpc-dotnet/blob/1a9cf64a720e3423d900df266dfe10a315917b52/perf/benchmarkapps/QpsWorker/Infrastructure/ClientRunner.cs#L227

I don't know why a 1MB unary call suddenly gets stuck, but it's quite suspicious. I tried adding a long deadline (15sec) to the unary call and that makes the problem no longer reproduce. Is this possibly a .NET 7 bug? I don't think we were seeing problems like this when the worker was running on .NET6.

@JamesNK could you please take a look?

jtattermusch avatar Dec 14 '22 15:12 jtattermusch

I also tried to reproduce on a revision just before https://github.com/grpc/grpc-dotnet/pull/1803 was merged and I wasn't able to reproduce. That really seems to indicate that there might be a .NET7 bug that makes the 1MB unary call get stuck?

jtattermusch avatar Dec 14 '22 15:12 jtattermusch

How can I reproduce this locally? I attempted to run it locally using instructions at https://github.com/grpc/grpc-dotnet/tree/948be08b088fbf449982b731c8e1a7cf8abb1965/perf/benchmarkapps#qpsworker and changed the client args to use streaming and 1MB payloads, but I haven't seen it freeze.

JamesNK avatar Jan 17 '23 01:01 JamesNK

Ok, I am able to reproduce outside of kubernetes locally on my linux machine. Here are the repro instructions:

scenarios.json file

Prepare

# build the "qps_json_driver" from grpc/grpc with bazel (building it with cmake is also fine, but bazel seems to be the easiest option.
tools/bazel build //test/cpp/qps:qps_json_driver
# build the qps worker with the same command as in https://github.com/grpc/test-infra/blob/cd5473c9853127768a9fd11b6da9547ee0bfafcc/containers/pre_built_workers/dotnet/Dockerfile#L39 
dotnet publish -c Release -o qps_worker perf/benchmarkapps/QpsWorker/

Run qps workers locally

# Run 2 instances of dotnet qps_worker, each in a different terminal
qps_worker/QpsWorker -- --driver_port 5050
qps_worker/QpsWorker -- --driver_port 5051

Run qps driver to orchestrate running the benchmark.

# run qps_json_driver against the two instances of local dotnet qps workers (first one will be server, the other will be client)
QPS_WORKERS=localhost:5050,localhost:5051 bazel-bin/test/cpp/qps/qps_json_driver --scenarios_file=scenarios.json

I'm able to reproduce the problem (driver gets stuck after printing "Received final status from client 0" due to client getting stuck in "Exiting RunClient." as described in https://github.com/grpc/grpc-dotnet/issues/1926#issuecomment-1348801413) in about 20% of runs. (you can just keep running the qps_json_driver command and it's going to get stuck eventually). In the benchmarks, the qps_worker is started fresh for each scenario run, but that seems to end up with the same result.

jtattermusch avatar Jan 27 '23 10:01 jtattermusch

Additional tip: If you find it difficult to build the qps_json_driver binary locally, you can build & run it under docker with --network=host:

# use grpc's "bazel" development docker image
docker run --rm --network=host -it us-docker.pkg.dev/grpc-testing/testing-images-public/bazel:2345e87c22697f699d33a64d305a0177ce77e5f5@sha256:b42f10625321296c7f646e1e82023d0b46e4f9b6d2205cff2b49352c5dc1ce9f

# clone grpc/grpc, then build qps_json_driver with bazel and run it.
# thanks to --network=host is should be able to reach the qps worker instances running on the same machine

jtattermusch avatar Jan 27 '23 10:01 jtattermusch

According to my experiments, it seems irrelevevant whether the profile guided optimizations from https://github.com/grpc/test-infra/blob/cd5473c9853127768a9fd11b6da9547ee0bfafcc/containers/pre_built_workers/dotnet/Dockerfile#L47-L49 are used or not (scenario gets stuck eventually with or without them).

jtattermusch avatar Jan 27 '23 10:01 jtattermusch

I got the QPS JSON driver working from docker. Unfortunatly I still can't reproduce this. I think it ran it over 20 times and didn't get a hang.

Console output: console.txt

JamesNK avatar Feb 04 '23 15:02 JamesNK

Ideas for next steps:

  • You could run wireshark on your computer to see the state of network traffic when the hang happens. If the server has completely written the response then that indicates something is happening in the client.
  • You could try running the QPS worker client and server apps from different .NET versions. i.e. 5050 is net7, 5051 is net6. That might also indicate whether it is the client or the server that is the problem
  • It's possible to set a log level for the client and server that are started. They might have some information. Most of the client HTTP work is logged via a different mechanism but there is a helper type that writes it to the other logging: https://github.com/grpc/grpc-dotnet/blob/674758e4d3c6b9ac9aa0e95e0e5a0e13d77d8383/test/Shared/HttpEventSourceListener.cs

JamesNK avatar Feb 04 '23 15:02 JamesNK

FYI I'm managed to reproduce the hanging on both Linux and Windows.

I run the qps driver in a loop. On my Linux VM I can get it to fail after between 2 and 50 iterations. On Windows laptop failed after 94 iterations.

I'll investigate further.

tonydnewell avatar Mar 23 '23 13:03 tonydnewell

Further observations:

I ran combinations of client and server with .NET 6 and .NET 7

Both .NET 7 - hangs occur Both .NET 6 - could not reproduce (tried 400x) Client .NET 7 and Server .NET 6 - could not reproduce Client .NET 6 and Server .NET 7 - hangs occur

So it appears it is a server side problem with .NET 7

My findings are the same as @jtattermusch in that the client is hanging in await client.UnaryCallAsync(request); in RunUnaryAsync. I still need to determine what is happening on the server.

Wireshark traces are not useful as TLS being used. I may try without TLS.

tonydnewell avatar Apr 01 '23 09:04 tonydnewell

@JamesNK @jtattermusch Tested with newer and older .NET versions

Latest .NET 7 (7.0.202) still fails

.NET 7 RC 1 (7.0.100-rc.1.22431.12) - seems OK, can't get to fail .NET 7 RC 2 (7.0.100-rc.2.22477.23) - fails

tonydnewell avatar Apr 03 '23 13:04 tonydnewell

I've managed to capture a network trace.

What should happen is that the client sends 1MB of data to the server and the server sends it straight back unchanged - this is via a UnaryCall.

When the hang occurs the client sends the data, the server responds with a HTTP2 header of status: 200 OK and starts send the response data, but all goes quiet on the network before the server has sent back all of the data:

server-hang

I would be expecting a lot more DATA[377] packets to be sent by the server, which is the case for successful responses earlier in the test.

tonydnewell avatar Apr 05 '23 13:04 tonydnewell

I've captured logs from the driver, server and client. I added extra logging and enabled client and server trace logging. See attached zip file.

The attached logs are a subset of those captured showing a successful run (TEST RUN 113) and a failing run (TEST RUN 114). I've deleted some of the duplicated data and annotated some of the logs.

Explaining what the processes are doing to interpret the logs:

Each run consists of the Driver process starts and stops client and server runners by calling the RunClient and RunServer services in the Worker processes. Stats are returned (streamed) by those services.

The client runner (started by RunClient) loops calling the UnaryCall services on the server - the call sends 1 MB of data and the server returns the same 1 MB of data. The client stops when told do so by the driver (when the request stream on the RunClient call is closed). It then waits for the current UnaryCall to finish.

When the server hangs the logs just show the server no longer sending back the data (same as the network trace showed):

14:56:17.3455 [SR] dbug: Microsoft.AspNetCore.Server.Kestrel[26]
      Connection id "0HMPV93DF7RI1", Request id "0HMPV93DF7RI1:00000101": done reading request body.
14:56.17.3455 [BS] dbug: BenchmarkServiceImpl[0]
      BenchmarkService UnaryCall entered
14:56.17.3544 [BS] dbug: BenchmarkServiceImpl[0]
      BenchmarkService UnaryCall returning
14:56:17.3545 [SR] dbug: Grpc.AspNetCore.Server.ServerCallHandler[15]
      Sending message.
14:56:17.3549 [SR] info: Microsoft.AspNetCore.Routing.EndpointMiddleware[1]
      Executed endpoint 'gRPC - /grpc.testing.BenchmarkService/UnaryCall'
14:56:17.3549 [SR] info: Microsoft.AspNetCore.Hosting.Diagnostics[2]
      Request finished HTTP/2 POST http://localhost:30000/grpc.testing.BenchmarkService/UnaryCall application/grpc - - 200 - application/grpc 15.2206ms

**** HERE MISSING SEND THE DATA BACK TO THE CLIENT

The logs show that the UnaryCall is returning (see time 14:56.17.3544 ) which is logged by a line I added before the return in: https://github.com/grpc/grpc-dotnet/blob/6bf44820a2ef6af5845830728cd4b3d1a8053ed4/perf/benchmarkapps/Shared/BenchmarkServiceImpl.cs#L30

The client is left waiting for the response from the server.

@JamesNK Any further Kestrel or gRPC logging I can add? As noted above this is only reproducible on .NET 7 RC 2 (7.0.100-rc.2.22477.23) and newer.

Captured logs - logs.zip

tonydnewell avatar Apr 18 '23 13:04 tonydnewell

These are the changes that happened in aspnetcore between R1 and R2: https://github.com/dotnet/aspnetcore/compare/v7.0.0-rc.1.22427.2...v7.0.0-rc.2.22476.2

RC2 changes are pretty conservative, so it's surprising that the problem would show up there.

This is the only change that looks relevant to me: dotnet/aspnetcore@v7.0.0-rc.1.22427.2...v7.0.0-rc.2.22476.2#diff-6fffe3b4a8

@davidfowl Could this change have introduced a bug that would cause socket writes to hang for some reason?

JamesNK avatar Apr 26 '23 03:04 JamesNK