tonic
tonic copied to clipboard
Client must not expect to hear back from server when establishing bidirectional stream
Currently it does expect server to write something back. This causes a hang.
Illustrated example: https://github.com/hyperium/tonic/blob/master/interop/src/client.rs#L155
Move this line after full_duplex_call
and observe the test fail with timeout.
This issue also affects one of my crates and causes tests to get stuck and fail. Would love to see this get fixed soon. I would like to take a look at this but I'm not sure where to start.
Do either of you know if this only happens with grpc-go
or if happens with other h2 servers?
Do either of you know if this only happens with
grpc-go
or if happens with other h2 servers?
Not sure, as our server is only implemented with Go currently. I will try testing it with the Python examples on https://github.com/grpc/grpc/tree/master/examples/python today.
I have made a repo showcasing the bug happening with the Python examples I linked: https://github.com/yusdacra/tonic-bug
@LucioFranco
Thanks, I do not have time this week to dig into it but will get to this ASAP.
I've done some digging but unsure about the best way to fix this. The hang is initiated from Grpc::streaming
which eventually ends up in Reconnect::call
where it tries to setup the connection to the server. So if the server never responds then it hangs.
I suppose we could change Grpc::streaming
to not block until the connection has been established by connecting in a background task instead and sending the connection back in a oneshot channel. I guess that would mean blocking for that task to complete in Stream::message
instead and propagate any connection errors there.
@LucioFranco Do you think makes sense? If so I would like to take a stab at implementing it 😊
The Java gRPC server implementation also appears to behave this way.
Is there a workaround for this? What does the tonic server send back to the client which makes it work for rust and not for other implementations?
.NET server implementation also triggers this issue.
I can workaround the issue, as I control both ends, by issuing a await responseStream.WriteAsync("ping");
before the usual while (await requestStream.MoveNext()) { ... }
server-side, and discarding the first response right after stream construction client-side.
+1 seeing this issue and unable to use a streaming client as the server is not in my control (seems same issue https://github.com/hyperium/tonic/discussions/1233)
tried comparing python/c++ implementations, my hunch was that the other clients send an 'initial metadata' payload which triggers the server side to always respond, couldn't find similar thing within tonic implementation.
e.g. https://github.com/grpc/grpc/blob/fc159a690158ed089b19d3eb9f76e8399e3207ca/src/python/grpcio_tests/tests/unit/_cython/cygrpc_test.py#L361-L362
@jordy25519 do you have a reproduction I can look at?
I don't have one on hand. the relevant generated client code is:
hangs here: self.inner.server_streaming(req, path, codec).await
pub async fn stream_foo(
&mut self,
request: impl tonic::IntoRequest<super::StreamFooRequest>,
) -> std::result::Result<
tonic::Response<tonic::codec::Streaming<super::StreamFooResponse>>,
tonic::Status,
> {
self.inner
.ready()
.await
.map_err(|e| {
tonic::Status::new(
tonic::Code::Unknown,
format!("Service was not ready: {}", e.into()),
)
})?;
let codec = tonic::codec::ProstCodec::default();
let path = http::uri::PathAndQuery::from_static(
"/crate_foo_rpc.CrateFooRpc/StreamFoo",
);
let mut req = request.into_request();
req.extensions_mut()
.insert(
GrpcMethod::new(
"crate_foo_rpc.CrateFooRpc",
"StreamFoo",
),
);
/// ---HANGS HERE---
self.inner.server_streaming(req, path, codec).await
}
Additionally, I observe the hang resolves after ~2-5minutes I assume due to a close/timeout message from the server which unblocks the client.
like https://github.com/hyperium/tonic/issues/515#issuecomment-778686311 I traced the hang to the Reconnect code but didn't dig much further
had another go at debugging this and in my particular case the server never responds with a http2 Header
frame if it has no message data to stream.
http2 handshake/connection is ok but blocks after sending streaming request and waiting for response in h2
: https://github.com/hyperium/h2/blob/da38b1c49c39ccf7e2e0dca51ebf8505d6905597/src/proto/streams/recv.rs#L318
(this condition is never satisfied until first Header frame sent along with first Data frame, technically they could be sent separately but seems like grpc servers are not doing this by default)
not sure what action tonic can take without knowing/receiving the response headers from the server.
zooming out, as a user I'd expect to get the Streaming
handle/instance back whether there's immediate data available or not and be able to await on that for the messages rather than block on setup i.e. decouple the setup await from the messages await
possibly better to solve on server side but also feel like this should be a 'it just works' thing