opamp-go
opamp-go copied to clipboard
[server] Method of handling websocket connections leaks goroutines
Context
Hello, this was found as a result of investigation for https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/30931. Enabling goleak
in the server
directory of opamp-go
shows the TestServerCallsHTTPMiddlewareOverWebsocket
test is leaking. Here's the call stack for the goleak
failure:
Stack trace
=== RUN TestServerCallsHTTPMiddlewareOverWebsocket
--- PASS: TestServerCallsHTTPMiddlewareOverWebsocket (0.01s)
PASS
goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 23 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
internal/poll.runtime_pollWait(0x4888de10, 0x72)
/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xc0001e2780?, 0xc000236000?, 0x0)
/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0001e2780, {0xc000236000, 0x1000, 0x1000})
/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/internal/poll/fd_unix.go:164 +0x27a
net.(*netFD).Read(0xc0001e2780, {0xc000236000?, 0x0?, 0x0?})
/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/net/fd_posix.go:55 +0x25
net.(*conn).Read(0xc0001940c0, {0xc000236000?, 0x10?, 0x10?})
/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/net/net.go:179 +0x45
bufio.(*Reader).fill(0xc0001a8300)
/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/bufio/bufio.go:110 +0x103
bufio.(*Reader).Peek(0xc0001a8300, 0x2)
/Users/crobert/go/pkg/mod/golang.org/[email protected]/src/bufio/bufio.go:148 +0x53
github.com/gorilla/websocket.(*Conn).read(0xc0001dc2c0, 0x0?)
/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:378 +0x26
github.com/gorilla/websocket.(*Conn).advanceFrame(0xc0001dc2c0)
/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:824 +0x6d
github.com/gorilla/websocket.(*Conn).NextReader(0xc0001dc2c0)
/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:1034 +0x13e
github.com/gorilla/websocket.(*Conn).ReadMessage(0x193d0a0?)
/Users/crobert/go/pkg/mod/github.com/gorilla/[email protected]/conn.go:1120 +0x13
github.com/open-telemetry/opamp-go/server.(*server).handleWSConnection(0xc0001f00c0, {0x197e350, 0xc00028e0a0}, 0xc0001dc2c0, {0x197d918, 0xc0002b2018})
/Users/crobert/dev/opamp-go/server/serverimpl.go:240 +0x1a5
created by github.com/open-telemetry/opamp-go/server.(*server).httpHandler in goroutine 22
/Users/crobert/dev/opamp-go/server/serverimpl.go:210 +0x350
]
This shows the test itself is passing, but goleak
detects leaked goroutines.
Analysis
I found the culprit to be this section:
https://github.com/open-telemetry/opamp-go/blob/c7fc5850ffdf65e97a39b080d73f1f23346bf52a/server/serverimpl.go#L233-L234
As shown by the comment, this line will block until the connection can read a message or until it's closed. However, the close call is in the same goroutine, within a defer
statement. This means the websocket will never be closed, and the goroutine is leaked.
The solution would be to have the wsConn.Close
method be called when the reqCtx
is cancelled, from what I can tell. This will allow the read call to stop blocking and properly finish running. Note that I don't have much opamp experience, so the end solution may be quite different than my suggestion.