go-ethereum
go-ethereum copied to clipboard
potential go routine leak in rpc package
System information
OS & Version: Windows 11 & Ubuntu 18.04 Commit hash : develop
When I run the TestClientCancelWebsocket with goleak, there are some go routine leaks, as mentioned below. Actually I think it blocks at this function.
func (h *handler) startCallProc(fn func(*callProc)) {
h.callWG.Add(1)
var wg sync.WaitGroup
go func() {
ctx, cancel := context.WithCancel(h.rootCtx)
defer h.callWG.Done()
defer cancel()
fn(&callProc{ctx: ctx})
}()
}
In TestClientCancelWebsocket
, the fn
is to test block (test_block
) so I think it may block forever and the h.callWG.Done
will never be called. Although it is caused by a client test, it causes leaks in the server part. I wonder if it is normal for this fn
call without a timeout.
Expected behaviour
Pass
Actual behaviour
Goroutine 35 in state select, with github.com/ethereum/go-ethereum/rpc.(*Client).dispatch on top of the stack:
goroutine 35 [select]:
github.com/ethereum/go-ethereum/rpc.(*Client).dispatch(0xc0001b2000, {0x95b808?, 0xc0001a0000})
/data/rpc/client.go:587 +0x2cc
created by github.com/ethereum/go-ethereum/rpc.initClient
/data/rpc/client.go:251 +0x311
Goroutine 141 in state select, with github.com/ethereum/go-ethereum/rpc.(*Client).Close on top of the stack:
goroutine 141 [select]:
github.com/ethereum/go-ethereum/rpc.(*Client).Close(...)
/data/rpc/client.go:284
github.com/ethereum/go-ethereum/rpc.(*Server).ServeCodec(0xc000159a10, {0x95b808?, 0xc000270bd0}, 0xc000232700?)
/data/rpc/server.go:89 +0x1b8
github.com/ethereum/go-ethereum/rpc.(*Server).WebsocketHandler.func1({0x95a170?, 0xc00026e620?}, 0xc000232700)
/data/rpc/websocket.go:64 +0xef
net/http.HandlerFunc.ServeHTTP(0x0?, {0x95a170?, 0xc00026e620?}, 0x46c6ee?)
/usr/local/go/src/net/http/server.go:2109 +0x2f
net/http.serverHandler.ServeHTTP({0xc000272fc0?}, {0x95a170, 0xc00026e620}, 0xc000232700)
/usr/local/go/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0000015e0, {0x95a6c0, 0xc000272240})
/usr/local/go/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:3102 +0x4db
Goroutine 13 in state chan receive, with github.com/ethereum/go-ethereum/rpc.(*Server).ServeCodec on top of the stack:
goroutine 13 [chan receive]:
github.com/ethereum/go-ethereum/rpc.(*Server).ServeCodec(0xc000159a10, {0x95b808?, 0xc0002706c0}, 0xc000232400?)
/data/rpc/server.go:88 +0x165
github.com/ethereum/go-ethereum/rpc.(*Server).WebsocketHandler.func1({0x95a170?, 0xc00026e2a0?}, 0xc000232400)
/data/rpc/websocket.go:64 +0xef
net/http.HandlerFunc.ServeHTTP(0x0?, {0x95a170?, 0xc00026e2a0?}, 0x46c6ee?)
/usr/local/go/src/net/http/server.go:2109 +0x2f
net/http.serverHandler.ServeHTTP({0xc000272840?}, {0x95a170, 0xc00026e2a0}, 0xc000232400)
/usr/local/go/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc000001400, {0x95a6c0, 0xc000272240})
/usr/local/go/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:3102 +0x4db
Goroutine 144 in state semacquire, with sync.runtime_Semacquire on top of the stack:
goroutine 144 [semacquire]:
sync.runtime_Semacquire(0xc000138ac0?)
/usr/local/go/src/runtime/sema.go:62 +0x25
sync.(*WaitGroup).Wait(0x18?)
/usr/local/go/src/sync/waitgroup.go:139 +0x52
github.com/ethereum/go-ethereum/rpc.(*handler).startCallProc(0xc000270c60, 0xc000012e58)
/data/rpc/handler.go:232 +0xd4
github.com/ethereum/go-ethereum/rpc.(*handler).handleMsg(0xc000270c60, 0xc00025bb20)
/data/rpc/handler.go:138 +0x85
github.com/ethereum/go-ethereum/rpc.(*Client).dispatch(0xc000230900, {0x95b808?, 0xc000270bd0})
/data/rpc/client.go:596 +0x411
created by github.com/ethereum/go-ethereum/rpc.initClient
/data/rpc/client.go:251 +0x311
Steps to reproduce the behaviour
run TestClientCancelWebsocket with goleak
func TestClientCancelWebsocket(t *testing.T) {
opts := []goleak.Option{
// TODO: figure the reason and shorten this list
goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/metrics.(*meterArbiter).tick"),
goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/core.(*txSenderCacher).cache"),
goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/consensus/ethash.(*remoteSealer).loop"),
goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/core/txpool.(*TxPool).loop"),
goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/core.(*BlockChain).updateFutureBlocks"),
goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/core/txpool.(*TxPool).scheduleReorgLoop"),
goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/rpc.(*testService).Block"),
}
defer goleak.VerifyNone(t, opts...)
defer time.Sleep(3*time.Second)
testClientCancel("ws", t)
}
Backtrace
see above
Could repro, looks like the following diff kinda fixes it. Maybe we should add a timeout to the rpc server/rpc client context.
diff --git a/rpc/handler.go b/rpc/handler.go
index f3052e7eb..f921e30ac 100644
--- a/rpc/handler.go
+++ b/rpc/handler.go
@@ -219,7 +219,7 @@ func (h *handler) cancelServerSubscriptions(err error) {
func (h *handler) startCallProc(fn func(*callProc)) {
h.callWG.Add(1)
go func() {
- ctx, cancel := context.WithCancel(h.rootCtx)
+ ctx, cancel := context.WithTimeout(h.rootCtx, time.Second)
defer h.callWG.Done()
defer cancel()
fn(&callProc{ctx: ctx})
*Edit To be clear I'm not proposing this as a fix, it should only show where the issue is
i try to test it but i need to know "goleak" in which branch
i try to test it but i need to know "goleak" in which branch
It's a third party leak checker, include by go.uber.org/goleak
http is also leaking.
func TestHTTPClientGoroutineLeak(t *testing.T) {
defer goleak.VerifyNone(t)
ctx := context.Background()
// connect to local node
client, err := DialContext(ctx, "http://localhost:8545")
if err != nil {
t.Fatal("can't dial", err)
}
defer client.Close()
if err := client.CallContext(ctx, nil, "eth_chainId"); err != nil {
t.Fatal(err)
}
}
// Once run above test, go routine leak detected.
--- FAIL: TestHTTPClientGoroutineLeak (0.45s)
/Users/tak/Documents/angoya/_sandbox/go-ethereum/rpc/client_test.go:70: found unexpected goroutines:
[Goroutine 25 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
...
Then I noticed that missing to close idling connections
diff --git a/rpc/http.go b/rpc/http.go
index bbabe15ba..4f45d8d46 100644
--- a/rpc/http.go
+++ b/rpc/http.go
@@ -73,6 +73,7 @@ func (hc *httpConn) readBatch() ([]*jsonrpcMessage, bool, error) {
func (hc *httpConn) close() {
hc.closeOnce.Do(func() { close(hc.closeCh) })
+ hc.client.CloseIdleConnections()
}
And missing httpConn.close()
diff --git a/rpc/client.go b/rpc/client.go
index a509cb2e0..e508ed304 100644
--- a/rpc/client.go
+++ b/rpc/client.go
@@ -280,6 +280,7 @@ func (c *Client) SupportedModules() (map[string]string, error) {
// Close closes the client, aborting any in-flight requests.
func (c *Client) Close() {
if c.isHTTP {
+ c.writeConn.(*httpConn).close()
return
}
Those two changes solve go routine leak.