connect-go icon indicating copy to clipboard operation
connect-go copied to clipboard

ServerStreamForClient.Close hangs forever

Open amenzhinsky opened this issue 1 year ago • 5 comments

Describe the bug

A client service needs to forcibly stop a server stream, but the Close method hangs indefinitely, preventing the connection-handling function from ever returning.

goroutine 82 [sync.Cond.Wait, 42 minutes]:
sync.runtime_notifyListWait(0xc046ba84c8, 0x65)
	runtime/sema.go:569 +0x159
sync.(*Cond).Wait(0x1a?)
	sync/cond.go:70 +0x85
golang.org/x/net/http2.(*pipe).Read(0xc046ba84b0, {0xc047066000, 0x2000, 0x2000})
	golang.org/x/[email protected]/http2/pipe.go:76 +0xdf
golang.org/x/net/http2.transportResponseBody.Read({0x1?}, {0xc047066000?, 0x1?, 0x0?})
	golang.org/x/[email protected]/http2/transport.go:2637 +0x65
io.(*LimitedReader).Read(0xc0435c5d40, {0xc047066000?, 0x10?, 0x1?})
	io/io.go:479 +0x43
io.discard.ReadFrom({}, {0x12b2d80, 0xc0435c5d40})
	io/io.go:666 +0x6d
io.copyBuffer({0x12b4620, 0x1b89ae0}, {0x12b2d80, 0xc0435c5d40}, {0x0, 0x0, 0x0})
	io/io.go:415 +0x151
io.Copy(...)
	io/io.go:388
connectrpc.com/connect.discard({0x7d06ab099df0, 0xc046ba8480})
	connectrpc.com/[email protected]/protocol.go:296 +0xc5
connectrpc.com/connect.(*duplexHTTPCall).CloseRead(0xc0435c0d20)
	connectrpc.com/[email protected]/duplex_http_call.go:241 +0x6d
connectrpc.com/connect.(*connectStreamingClientConn).CloseResponse(0xc046f9a770?)
	connectrpc.com/[email protected]/protocol_connect.go:647 +0x17
connectrpc.com/connect.(*errorTranslatingClientConn).CloseResponse(0xc0435c4c18)
	connectrpc.com/[email protected]/protocol.go:218 +0x23
connectrpc.com/otelconnect.(*streamingClientInterceptor).CloseResponse(0xc0439fff50)
	connectrpc.com/[email protected]/payloadinterceptor.go:38 +0x23
connectrpc.com/connect.(*ServerStreamForClient[...]).Close(...)
	connectrpc.com/[email protected]/client_stream.go:170
[CUT]

To Reproduce

And in example_test.go:

package bugreport

func TestThatReproducesBug(t *testing.T) {
	stream, err := client.ServerStreamMethod(ctx, connect.NewRequest(...))
	if err != nil {
		t.Fatal(err)
	}
	defer stream.Close()

	for stream.Receive() {
		// trigger `defer`
		return
	}
}

Environment (please complete the following information):

  • connect-go version or commit: v1.16.2
  • go version: go1.22.4
  • your complete go.mod:
connectrpc.com/connect v1.16.2

amenzhinsky avatar Oct 15 '24 10:10 amenzhinsky

There are two improvements we could make here:

  1. The documentation could be improved that the Close method of a server stream (and CloseResponse method of a bidi stream) are graceful operations. They will not abort the RPC and attempt to consume the remainder of the server's response before returning. They are generally used to release any remaining resources associated with an RPC after the server's response has been fully consumed. If you want to abort the RPC before the server's response has been received, one must cancel the context.Context that was used to create the RPC. Cancelling an RPC this way will trigger a cancellation error in the server.
  2. We could improve the Close method to have better time and network I/O bounds. So if it takes longer than N duration or consumes more than X bytes, it would skip the graceful close and cancel the RPC. We'd need to agree on reasonable values of N and X and also decide whether to expose client options to customize these values.

jhump avatar Oct 15 '24 13:10 jhump

@jhump Hello, thank you for the fast answer! Can it be implemented by creating a new context with cancel inside rpc method call and cancel it on stream.Close?

anaxita avatar Oct 15 '24 14:10 anaxita

Can it be implemented by creating a new context with cancel inside rpc method call and cancel it on stream.Close?

Yes.

jhump avatar Oct 15 '24 14:10 jhump

Can it be implemented by creating a new context with cancel inside rpc method call and cancel it on stream.Close?

Yes.

To be sure, I told about implementing that on connect lib side, and you?

anaxita avatar Oct 15 '24 14:10 anaxita

It is the same either way. But in the connect lib side:

  1. Care must be taken to ensure that the cancel function is always called at the end of the RPC (even when successful), to avoid any resource leaks.
  2. The lib should first attempt to gracefully close by consuming the response up to some time limit and up to some limit of bytes-read, and only cancel the RPC as a last resort to avoid hanging.

jhump avatar Oct 15 '24 14:10 jhump

Created a PR #791 to discuss a third solution. Changing the behavior of the client closures to be non-blocking. This may not be possible if the current behavior is relied upon but I think this optimization can still be implemented by clients calling receive until the stream is drained.

emcfarlane avatar Oct 28 '24 15:10 emcfarlane

I have an example where cancelling the context does not seem to have any effect (when http2 is enabled): #801

Incidentally, #791 would fix it.

oliverpool avatar Dec 04 '24 09:12 oliverpool