tonic icon indicating copy to clipboard operation
tonic copied to clipboard

Client must not expect to hear back from server when establishing bidirectional stream

Open vorot93 opened this issue 4 years ago • 18 comments

Currently it does expect server to write something back. This causes a hang.

Illustrated example: https://github.com/hyperium/tonic/blob/master/interop/src/client.rs#L155

Move this line after full_duplex_call and observe the test fail with timeout.

vorot93 avatar Dec 28 '20 22:12 vorot93

This issue also affects one of my crates and causes tests to get stuck and fail. Would love to see this get fixed soon. I would like to take a look at this but I'm not sure where to start.

yusdacra avatar Jan 13 '21 16:01 yusdacra

Do either of you know if this only happens with grpc-go or if happens with other h2 servers?

LucioFranco avatar Jan 14 '21 21:01 LucioFranco

Do either of you know if this only happens with grpc-go or if happens with other h2 servers?

Not sure, as our server is only implemented with Go currently. I will try testing it with the Python examples on https://github.com/grpc/grpc/tree/master/examples/python today.

yusdacra avatar Jan 20 '21 19:01 yusdacra

I have made a repo showcasing the bug happening with the Python examples I linked: https://github.com/yusdacra/tonic-bug

@LucioFranco

yusdacra avatar Jan 20 '21 21:01 yusdacra

Thanks, I do not have time this week to dig into it but will get to this ASAP.

LucioFranco avatar Jan 20 '21 21:01 LucioFranco

I've done some digging but unsure about the best way to fix this. The hang is initiated from Grpc::streaming which eventually ends up in Reconnect::call where it tries to setup the connection to the server. So if the server never responds then it hangs.

I suppose we could change Grpc::streaming to not block until the connection has been established by connecting in a background task instead and sending the connection back in a oneshot channel. I guess that would mean blocking for that task to complete in Stream::message instead and propagate any connection errors there.

@LucioFranco Do you think makes sense? If so I would like to take a stab at implementing it 😊

davidpdrsn avatar Feb 13 '21 22:02 davidpdrsn

The Java gRPC server implementation also appears to behave this way.

sfackler avatar Jul 19 '22 20:07 sfackler

Is there a workaround for this? What does the tonic server send back to the client which makes it work for rust and not for other implementations?

behos avatar Jan 05 '23 11:01 behos

.NET server implementation also triggers this issue.

I can workaround the issue, as I control both ends, by issuing a await responseStream.WriteAsync("ping"); before the usual while (await requestStream.MoveNext()) { ... } server-side, and discarding the first response right after stream construction client-side.

r-ml avatar Apr 30 '23 16:04 r-ml

+1 seeing this issue and unable to use a streaming client as the server is not in my control (seems same issue https://github.com/hyperium/tonic/discussions/1233)

tried comparing python/c++ implementations, my hunch was that the other clients send an 'initial metadata' payload which triggers the server side to always respond, couldn't find similar thing within tonic implementation.

e.g. https://github.com/grpc/grpc/blob/fc159a690158ed089b19d3eb9f76e8399e3207ca/src/python/grpcio_tests/tests/unit/_cython/cygrpc_test.py#L361-L362

jordy25519 avatar Sep 20 '23 00:09 jordy25519

@jordy25519 do you have a reproduction I can look at?

LucioFranco avatar Sep 20 '23 15:09 LucioFranco

I don't have one on hand. the relevant generated client code is: hangs here: self.inner.server_streaming(req, path, codec).await

pub async fn stream_foo(
    &mut self,
    request: impl tonic::IntoRequest<super::StreamFooRequest>,
) -> std::result::Result<
    tonic::Response<tonic::codec::Streaming<super::StreamFooResponse>>,
    tonic::Status,
> {
    self.inner
        .ready()
        .await
        .map_err(|e| {
            tonic::Status::new(
                tonic::Code::Unknown,
                format!("Service was not ready: {}", e.into()),
            )
        })?;
    let codec = tonic::codec::ProstCodec::default();
    let path = http::uri::PathAndQuery::from_static(
        "/crate_foo_rpc.CrateFooRpc/StreamFoo",
    );
    let mut req = request.into_request();
    req.extensions_mut()
        .insert(
            GrpcMethod::new(
                "crate_foo_rpc.CrateFooRpc",
                "StreamFoo",
            ),
        );
    /// ---HANGS HERE---
    self.inner.server_streaming(req, path, codec).await
}

Additionally, I observe the hang resolves after ~2-5minutes I assume due to a close/timeout message from the server which unblocks the client.

like https://github.com/hyperium/tonic/issues/515#issuecomment-778686311 I traced the hang to the Reconnect code but didn't dig much further

jordy25519 avatar Sep 20 '23 22:09 jordy25519

had another go at debugging this and in my particular case the server never responds with a http2 Header frame if it has no message data to stream.

http2 handshake/connection is ok but blocks after sending streaming request and waiting for response in h2: https://github.com/hyperium/h2/blob/da38b1c49c39ccf7e2e0dca51ebf8505d6905597/src/proto/streams/recv.rs#L318 (this condition is never satisfied until first Header frame sent along with first Data frame, technically they could be sent separately but seems like grpc servers are not doing this by default) not sure what action tonic can take without knowing/receiving the response headers from the server.

zooming out, as a user I'd expect to get the Streaming handle/instance back whether there's immediate data available or not and be able to await on that for the messages rather than block on setup i.e. decouple the setup await from the messages await

possibly better to solve on server side but also feel like this should be a 'it just works' thing

jordy25519 avatar Sep 26 '23 22:09 jordy25519