connect-es icon indicating copy to clipboard operation
connect-es copied to clipboard

RST_STREAM Handling

Open mpataki opened this issue 1 year ago • 10 comments

Describe the bug

When a grpc server is abruptly closed while a streaming RPC is open, client may see a RST_STREAM http/2 frame. No grpc level status, trailer, etc. are present in the response.

I'm using a grpc transport and a PromiseClient, which represents a stream as an async iterable. This scenario results in the client terminating the iterable without error, as though the stream was closed normally/successfully.

The GRPC HTTP/2 protocol states the following:

RPC runtime implementations should interpret RST_STREAM as immediate full-closure of the stream and should propagate an error up to the calling application layer.

In my particular case, which is using a fairly vanilla Istio/Envoy proxy, the HTTP/2 code returned is NO_ERROR(0). I see no grpc status header.

I believe the expected client behaviour would be to throw an error from the iterable/stream so the application can handle it (possibly reconnecting, etc).

To Reproduce

I'm testing server restarts in a k8s deployment with Istio/Envoy in front of the services. Abruptly terminating pods, Envoy is responding with the RST_STREAM frame.

Environment:

  • @connectrpc/connect version: 1.4.0
  • @connectrpc/connect-node version: 1.4.0
  • Node.js version: 20.12.2

mpataki avatar Jun 15 '24 13:06 mpataki

For some more context, this is how grpcurl handles this case:

Response trailers received:
(empty)
Sent 1 request and received 1 response
ERROR:
  Code: Internal
  Message: stream terminated by RST_STREAM with error code: NO_ERROR

And this is how postman handles it: Pasted image 20240615075230

mpataki avatar Jun 15 '24 14:06 mpataki

Hey! Thank you reporting this, we are looking in to this. Node seems to always report a NO_ERROR code even when an RST_STREAM is not sent from the server.

srikrsna-buf avatar Jun 20 '24 12:06 srikrsna-buf

Ok! Thanks @srikrsna-buf. I'll stay posted. Please let me know if I can help at all

mpataki avatar Jun 20 '24 13:06 mpataki

Hey @srikrsna-buf, curious how your initial look at this panned out? Does it seem readily solvable or should we expect to work around this issue for the foreseeable future?

Just considering how we should proceed on our end 😃. I've got a pretty rock-n-roll patch in place as a stop gap but I'll sort something else if you think a connect-es fix is not in our future.

mpataki avatar Jul 02 '24 13:07 mpataki

One solution I was exploring is waiting for the trailers event to fire and if that fires, then a no error code is a no-op. But I need to test this. Once you receive data, you are not supposed to get a no error code.

I'm curious what is the patch you have?

srikrsna-buf avatar Jul 02 '24 13:07 srikrsna-buf

I'm treating any stream closure as an error, and attempting a reconnect. This allows us to recover from this case but introduces need for bespoke handling of cases where the stream was legitimately closed without error by our backend.

mpataki avatar Jul 04 '24 13:07 mpataki

I'm treating any stream closure as an error, and attempting a reconnect. This allows us to recover from this case but introduces need for bespoke handling of cases where the stream was legitimately closed without error by our backend.

I have a similar fix implemented in #1113. It checks to see if we received any response and throws an error. If you are using Envoy one problem this fix could result in is throwing an error when it is not. grpc-js suffers from a similar problem: https://github.com/grpc/grpc-node/issues/2569

srikrsna-buf avatar Jul 11 '24 17:07 srikrsna-buf

Nice, this makes sense to me fwiw @srikrsna-buf 🙏 thank-you!

mpataki avatar Jul 11 '24 23:07 mpataki

@srikrsna-buf checking back in on this. Anything blocking us?

mpataki avatar Aug 30 '24 13:08 mpataki

I'll pick this up next week, since we don't have a way to tell if we received an code, we will need to do it at the protocol level.

srikrsna-buf avatar Aug 30 '24 13:08 srikrsna-buf