grpc-node icon indicating copy to clipboard operation
grpc-node copied to clipboard

"Error: 14 UNAVAILABLE: Stream refused by server" caused by disconnect by server without grpc client knowing

Open thegleb opened this issue 3 years ago • 3 comments

Continuation of https://github.com/grpc/grpc-node/issues/1969

I pushed a sample client/server pair to this repo: https://github.com/thegleb/grpc-debug.

It appears the condition is triggered for us when the server responds with an extremely long error message. From there it seems like the http2 client potentially responds with a GOAWAY and error code 9, after which the connection is closed on the server side but the grpc client is never notified, leading to a pattern of 1) DEADLINE_EXCEEDED error and 2) UNAVAILABLE: Stream refused by server for any additional requests queued on the channel.

The expected output of the default test case looks like this.

Actual (default test case, broken behavior)
Server logs (exception is 100,000 characters long)

2681 [nioEventLoopGroup-3-1] DEBUG io.grpc.netty.NettyServerHandler  - [id: 0xb71ae346, L:/0:0:0:0:0:0:0:1:50051 - R:/0:0:0:0:0:0:0:1:51853] OUTBOUND HEADERS: streamId=1 headers=GrpcHttp2OutboundHeaders[:status: 200, content-type: application/grpc, grpc-status: 2, grpc-message: Aliquam euismod id odio quis commodo. 
<snip long message here, about 100,000 characters>
2703 [nioEventLoopGroup-3-1] DEBUG io.grpc.netty.NettyServerHandler  - [id: 0xb71ae346, L:/0:0:0:0:0:0:0:1:50051 - R:/0:0:0:0:0:0:0:1:51853] INBOUND GO_AWAY: lastStreamId=0 errorCode=9 length=0 bytes=
Client logs (long exception with 10 second timeout)

2022-01-27T02:16:56.648Z | call_stream | Starting stream on subchannel ::1:50051 with headers
		grpc-timeout: 9968m
		grpc-accept-encoding: identity,deflate,gzip
		accept-encoding: identity,gzip
		:authority: localhost:50051
		user-agent: grpc-node-js/1.2.9
		content-type: application/grpc
		:method: POST
		:path: /routeguide.RouteGuide/GetFeatureBad
		te: trailers

2022-01-27T02:16:56.648Z | call_stream | [0] attachHttp2Stream from subchannel ::1:50051
2022-01-27T02:16:56.648Z | subchannel_refcount | ::1:50051 callRefcount 0 -> 1
2022-01-27T02:16:56.649Z | call_stream | [0] sending data chunk of length 22 (deferred)
2022-01-27T02:16:56.649Z | call_stream | [0] calling end() on HTTP/2 stream
2022-01-27T02:16:56.650Z | subchannel | 127.0.0.1:50051 CONNECTING -> READY
2022-01-27T02:17:06.614Z | call_stream | [0] cancelWithStatus code: 4 details: "Deadline exceeded"
2022-01-27T02:17:06.614Z | call_stream | [0] ended with status: code=4 details="Deadline exceeded"
2022-01-27T02:17:06.615Z | subchannel_refcount | ::1:50051 callRefcount 1 -> 0
2022-01-27T02:17:06.615Z | call_stream | [0] close http2 stream with code 8
Error: 4 DEADLINE_EXCEEDED: Deadline exceeded
Expected (behavior with shorter exception lengths)
Server logs (exception is 90,000 characters or less)

3328 [nioEventLoopGroup-3-1] DEBUG io.grpc.netty.NettyServerHandler  - [id: 0x28b182aa, L:/0:0:0:0:0:0:0:1:50051 - R:/0:0:0:0:0:0:0:1:52227] OUTBOUND HEADERS: streamId=1 headers=GrpcHttp2OutboundHeaders[:status: 200, content-type: application/grpc, grpc-status: 2, grpc-message: Aliquam euismod id odio quis commodo. 
<snip long message here, 90,000 characters>
3388 [nioEventLoopGroup-3-1] DEBUG io.grpc.netty.NettyServerHandler  - [id: 0x28b182aa, L:/0:0:0:0:0:0:0:1:50051 - R:/0:0:0:0:0:0:0:1:52227] INBOUND RST_STREAM: streamId=1 errorCode=0
Client logs (normal length exception)

2022-01-27T02:29:46.217Z | call_stream | Starting stream on subchannel ::1:50051 with headers
		grpc-timeout: 9965m
		grpc-accept-encoding: identity,deflate,gzip
		accept-encoding: identity,gzip
		:authority: localhost:50051
		user-agent: grpc-node-js/1.2.9
		content-type: application/grpc
		:method: POST
		:path: /routeguide.RouteGuide/GetFeatureBad
		te: trailers

2022-01-27T02:29:46.218Z | call_stream | [0] attachHttp2Stream from subchannel ::1:50051
2022-01-27T02:29:46.218Z | subchannel_refcount | ::1:50051 callRefcount 0 -> 1
2022-01-27T02:29:46.218Z | call_stream | [0] sending data chunk of length 22 (deferred)
2022-01-27T02:29:46.218Z | call_stream | [0] calling end() on HTTP/2 stream
2022-01-27T02:29:46.220Z | subchannel | 127.0.0.1:50051 CONNECTING -> READY
2022-01-27T02:29:46.402Z | call_stream | [0] Received server headers:
		:status: 200
		content-type: application/grpc
		grpc-status: 2
		grpc-message: Aliquam euismod id odio quis commodo. Mauris quis dolor mollis, tristique erat nec, accumsan urna. <snip>

2022-01-27T02:29:46.407Z | call_stream | [0] Received server trailers:
		:status: 200
		content-type: application/grpc
		grpc-status: 2
		grpc-message: Aliquam euismod id odio quis commodo. Mauris quis dolor mollis, tristique erat nec, accumsan urna. <snip>

2022-01-27T02:29:46.411Z | call_stream | [0] received status code 2 from server
2022-01-27T02:29:46.412Z | call_stream | [0] received status details string "Aliquam euismod id odio quis commodo. Mauris quis dolor mollis, tristique erat nec, accumsan urna. <snip>" from server
2022-01-27T02:29:46.414Z | call_stream | [0] ended with status: code=2 details="Aliquam euismod id odio quis commodo. Mauris quis dolor mollis, tristique erat nec, accumsan urna. <snip>"
2022-01-27T02:29:46.416Z | subchannel_refcount | ::1:50051 callRefcount 1 -> 0
2022-01-27T02:29:46.417Z | call_stream | [0] close http2 stream with code 8
Error: 2 UNKNOWN: Aliquam euismod id odio quis commodo. Mauris quis dolor mollis, tristique erat nec, accumsan urna. <snip>
    at Object.callErrorFromStatus (/grpc-debug/node/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/grpc-debug/node/node_modules/@grpc/grpc-js/build/src/client.js:176:52)
    at Object.onReceiveStatus (/grpc-debug/node/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:336:141)
    at Object.onReceiveStatus (/grpc-debug/node/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:299:181)
    at /grpc-debug/node/node_modules/@grpc/grpc-js/build/src/call-stream.js:145:78
    at processTicksAndRejections (internal/process/task_queues.js:77:11) {
  code: 2,
  details: 'Aliquam euismod id odio quis commodo. Mauris quis dolor mollis, tristique erat nec, accumsan urna. <snip>',
  metadata: Metadata {
    internalRepr: Map(1) { 'content-type' => [Array] },
    options: {}
  }
}
Error: 2 UNKNOWN: Aliquam euismod id odio quis commodo. Mauris quis dolor mollis, tristique erat nec, accumsan urna. <snip>
    at Object.callErrorFromStatus (/grpc-debug/node/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/grpc-debug/node/node_modules/@grpc/grpc-js/build/src/client.js:176:52)
    at Object.onReceiveStatus (/grpc-debug/node/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:336:141)
    at Object.onReceiveStatus (/grpc-debug/node/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:299:181)
    at /grpc-debug/node/node_modules/@grpc/grpc-js/build/src/call-stream.js:145:78
    at processTicksAndRejections (internal/process/task_queues.js:77:11) {
  code: 2,
  details: 'Aliquam euismod id odio quis commodo. Mauris quis dolor mollis, tristique erat nec, accumsan urna. <snip>',
  metadata: Metadata {
    internalRepr: Map(1) { 'content-type' => [Array] },
    options: {}
  }
}
2022-01-27T02:29:46.441Z | call_stream | [0] HTTP/2 stream closed with code 8

As far as I can tell this is likely happening outside grpc as the client does not appear to be notified, which is why I suspect http2 library involvement. Meanwhile, the server gets the GOAWAY signal and presumably closes the channel, making it unavailable and causing all the behaviors outlined in https://github.com/grpc/grpc-node/issues/1969.

Does this behavior make sense? What might trigger that GO_AWAY error code 9 response?

thegleb avatar Jan 27 '22 02:01 thegleb

This is good information. Hopefully, I can use it to reproduce this bug with just Node and without gRPC, so that I can file a Node bug.

Code 9 is COMPRESSION_ERROR, with the longer description "The endpoint is unable to maintain the header compression context for the connection." The status information is sent as trailing headers, so it makes sense that a very long status could trigger that error.

murgatroid99 avatar Jan 27 '22 17:01 murgatroid99

Hi, is there possibilities that this issue is fixed ? Tried to push a huge load into system and I get exact same behavior too.

nptri avatar Dec 01 '23 07:12 nptri

@nptri This issue is almost two years old, and there have been substantial relevant changes to grpc-js in the meantime. If you are still experiencing the same problem on the newest version of grpc-js, it would be best to file a separate issue.

murgatroid99 avatar Dec 01 '23 14:12 murgatroid99