opamp-go icon indicating copy to clipboard operation
opamp-go copied to clipboard

[server] Method of handling websocket connections leaks goroutines

Open crobert-1 opened this issue 10 months ago • 0 comments

Context

Hello, this was found as a result of investigation for https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/30931. Enabling goleak in the server directory of opamp-go shows the TestServerCallsHTTPMiddlewareOverWebsocket test is leaking. Here's the call stack for the goleak failure:

Stack trace

=== RUN   TestServerCallsHTTPMiddlewareOverWebsocket
--- PASS: TestServerCallsHTTPMiddlewareOverWebsocket (0.01s)
PASS
goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 23 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
internal/poll.runtime_pollWait(0x4888de10, 0x72)
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xc0001e2780?, 0xc000236000?, 0x0)
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0001e2780, {0xc000236000, 0x1000, 0x1000})
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_unix.go:164 +0x27a
net.(*netFD).Read(0xc0001e2780, {0xc000236000?, 0x0?, 0x0?})
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/net/fd_posix.go:55 +0x25
net.(*conn).Read(0xc0001940c0, {0xc000236000?, 0x10?, 0x10?})
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/net/net.go:179 +0x45
bufio.(*Reader).fill(0xc0001a8300)
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/bufio/bufio.go:110 +0x103
bufio.(*Reader).Peek(0xc0001a8300, 0x2)
	/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/bufio/bufio.go:148 +0x53
github.com/gorilla/websocket.(*Conn).read(0xc0001dc2c0, 0x0?)
	/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:378 +0x26
github.com/gorilla/websocket.(*Conn).advanceFrame(0xc0001dc2c0)
	/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:824 +0x6d
github.com/gorilla/websocket.(*Conn).NextReader(0xc0001dc2c0)
	/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:1034 +0x13e
github.com/gorilla/websocket.(*Conn).ReadMessage(0x193d0a0?)
	/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:1120 +0x13
github.com/open-telemetry/opamp-go/server.(*server).handleWSConnection(0xc0001f00c0, {0x197e350, 0xc00028e0a0}, 0xc0001dc2c0, {0x197d918, 0xc0002b2018})
	/Users/crobert/dev/opamp-go/server/serverimpl.go:240 +0x1a5
created by github.com/open-telemetry/opamp-go/server.(*server).httpHandler in goroutine 22
	/Users/crobert/dev/opamp-go/server/serverimpl.go:210 +0x350
]

This shows the test itself is passing, but goleak detects leaked goroutines.

Analysis

I found the culprit to be this section: https://github.com/open-telemetry/opamp-go/blob/c7fc5850ffdf65e97a39b080d73f1f23346bf52a/server/serverimpl.go#L233-L234 As shown by the comment, this line will block until the connection can read a message or until it's closed. However, the close call is in the same goroutine, within a defer statement. This means the websocket will never be closed, and the goroutine is leaked.

The solution would be to have the wsConn.Close method be called when the reqCtx is cancelled, from what I can tell. This will allow the read call to stop blocking and properly finish running. Note that I don't have much opamp experience, so the end solution may be quite different than my suggestion.

crobert-1 avatar Apr 17 '24 00:04 crobert-1