tonic icon indicating copy to clipboard operation
tonic copied to clipboard

GoAway are not handled properly by grpc client in reconnect.rs

Open DimanNe opened this issue 10 months ago • 0 comments

Bug Report

Version

v0.12.3

Platform

Linux

High-level problem

Even with retries implemented on the client side (manually) it seems, that grpc client uses same underlying TCP connection that is NOT accepting new http2 streams anymore.

How it happens

Server decides to shut down, sends GoAway to client, hyper-1.6.0/src/proto/h2/client.rs interprets it as Ready(Ok):

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        loop {
            match ready!(self.h2_tx.poll_ready(cx)) {
                Ok(()) => (),
                Err(err) => {
                    self.ping.ensure_not_timed_out()?;
                    return if err.reason() == Some(::h2::Reason::NO_ERROR) {
                        trace!("connection gracefully shutdown");
                        Poll::Ready(Ok(Dispatched::Shutdown))

and then tonic-0.12.3/src/transport/channel/service/reconnect.rs thinks that everything is fine, its state-machine does not initiate reconnect:

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        let mut state;

        if self.error.is_some() {
            return Poll::Ready(Ok(()));
        }

        loop {
            match self.state {
                State::Idle => {
                    trace!("poll_ready; idle");
                    match self.mk_service.poll_ready(cx) { ... }
                    let fut = self.mk_service.make_service(self.target.clone());
                    self.state = State::Connecting(fut);
                    continue;
                }
                State::Connecting(ref mut f) => {
                    trace!("poll_ready; connecting");
                    match Pin::new(f).poll(cx) { ... }
                }
                State::Connected(ref mut inner) => {
                    trace!("poll_ready; connected");

                    self.has_been_connected = true;

                    match inner.poll_ready(cx) {
                        Poll::Ready(Ok(())) => {
                            trace!("poll_ready; ready");
                            return Poll::Ready(Ok(()));
                        }
                        Poll::Pending => {
                            trace!("poll_ready; not ready");
                            return Poll::Pending;
                        }
                        Poll::Ready(Err(_)) => {
                            trace!("poll_ready; error");
                            state = State::Idle;
                        }
                    }
                }
            }

            self.state = state;
        }

        self.state = state;
        Poll::Ready(Ok(()))
    }

a consecutive call to fn call(&mut self, request: Request) -> Self::Future { returns a future that resolves into:

Internal Error: Status { code: Internal, message: "h2 protocol error: http2 error", source: Some(tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) }))) }

and then everything repeats (due to a retry mechanism on the client-side).

It looks like tonic does not know about Dispatched::Shutdown

DimanNe avatar Feb 15 '25 19:02 DimanNe