k6 icon indicating copy to clipboard operation
k6 copied to clipboard

client.asyncInvoke leads to: panic: send on closed channel

Open gm42 opened this issue 1 year ago • 3 comments

Brief summary

While using client.asyncInvoke I managed to trigger this panic:

panic: send on closed channel
Run             [======================================] setup()
goroutine 105931 [running]:----------------------------]
go.k6.io/k6/metrics.PushIfNotDone(...)-----------------]
	go.k6.io/k6/metrics/sample.go:135
go.k6.io/k6/lib/netext/grpcext.statsHandler.HandleRPC({0xc0027b2008?}, {0x1bb81d0, 0xc030f8d0b0}, {0x1bb3018, 0xc012619d60})
	go.k6.io/k6/lib/netext/grpcext/conn.go:254 +0x350
google.golang.org/grpc.(*csAttempt).finish(0xc0197f3d40, {0x1baa740, 0xc03cc89af8})
	google.golang.org/[email protected]/stream.go:1172 +0x47c
google.golang.org/grpc.(*clientStream).retryLocked(0xc029cc2240, 0x0?, {0x1baa740?, 0xc03cc89af8?})
	google.golang.org/[email protected]/stream.go:721 +0x5c
google.golang.org/grpc.(*clientStream).withRetry(0xc029cc2240, 0xc013fefb98, 0xc013fefbe0)
	google.golang.org/[email protected]/stream.go:791 +0x28c
google.golang.org/grpc.(*clientStream).RecvMsg(0xc029cc2240, {0x1769800?, 0xc019c4ce00?})
	google.golang.org/[email protected]/stream.go:927 +0x110
google.golang.org/grpc.invoke({0x1bb81d0?, 0xc030f8c840?}, {0xc0193bb4e0?, 0x20?}, {0x1769800, 0xc019c4cd80}, {0x1769800, 0xc019c4ce00}, 0xc0222eba00?, {0xc0222eba00, ...})
	google.golang.org/[email protected]/call.go:73 +0xcb
google.golang.org/grpc.(*ClientConn).Invoke(0xc00060cc08, {0x1bb81d0?, 0xc030f8c840?}, {0xc0193bb4e0?, 0x131?}, {0x1769800?, 0xc019c4cd80?}, {0x1769800?, 0xc019c4ce00?}, {0xc0222eba00, ...})
	google.golang.org/[email protected]/call.go:37 +0x23f
go.k6.io/k6/lib/netext/grpcext.(*Conn).Invoke(0xc000a5ce20, {0x1bb8208?, 0xc0005e54a0?}, {{0xc0193bb4e0, 0x1d}, {0x1bc7bf0, 0xc00293a588}, 0x1bf08eb000, 0xc024084448, {0xc019e4ec00, ...}, ...}, ...)
	go.k6.io/k6/lib/netext/grpcext/conn.go:141 +0x696
go.k6.io/k6/js/modules/k6/grpc.(*Client).AsyncInvoke.func1()
	go.k6.io/k6/js/modules/k6/grpc/client.go:306 +0xae
created by go.k6.io/k6/js/modules/k6/grpc.(*Client).AsyncInvoke in goroutine 1
	go.k6.io/k6/js/modules/k6/grpc/client.go:305 +0x27c

k6 version

0.51.0

OS

Linux

Docker version and image (if applicable)

Docker image tag 0.51.0

Steps to reproduce the problem

To reproduce this problem you need the gRPC endpoint to crash (OOM), which then causes a code 14 gRPC status:

error reading from server: EOF

Afterwards the panic may happen, but it's not 100% reproducible. Perhaps a test with -race will find the bug?

Expected behaviour

No panics when using client.asyncInvoke

Actual behaviour

A panic happened

gm42 avatar May 18 '24 19:05 gm42

Hey @gm42,

How often have you been able to reproduce it? I tried multiple times, simulating an OOM scenario in the server side, with no luck.

That said, have you tried to run client.close as part of the teardown function and see if it sill panics? If you're still able to reproduce it, I'd suggest to try that, as arguably one could say that the source of the panic is due to the lack of cleanup, although we could explore a mechanism to prevent that from happening at all, of course.

Thanks! 🙇🏻

joanlopez avatar May 27 '24 13:05 joanlopez

Any news @gm42? I spent some more time trying to reproduce this by reproducing a few more corner cases, as well as intentionally faking the 14 UNAVAILABLE gRPC status code, with no luck.

Would you be able to share how the script looks like, by striping sensitive details, if any, please?

joanlopez avatar Jun 03 '24 13:06 joanlopez

Considering that:

  • this seems to be happening because there's some "remaining" code running even after calling close(samples)
  • the AsyncInvoke code specifically looks pretty safe in terms of leaking something like that (as it adequately initializes a promise and registers & uses the callback)

I guessed: aha, it looks like this is probably "reproducible" with Go code that escapes the "main thread" without actually using promises/callbacks adequately (think of any goroutine that calls metrics.PushIfNotDone, with a reference to the channel and a non-cancelled context; which I may happen by mistake, and would probably be easy to force intentionally? 🤔

Thus, I've been taking a look at code in google.golang.org/grpc, referred from that stack trace, looking for any code that could produce this situation, and I haven't spot any so far.

However, I still think this is "doable" (intentionally for instance). So, do you think we should try to prevent that from k6 surface? Or it is just fine to assume that could happen, and treat each case as a bug fix from the extension code (gRPC in this case)?

In other words, if we assume this is caused by a "bug" in the gRPC code; do you think we should prevent that from happening at all in k6 anyway? Or just consider that this should be fixed in gRPC? cc/ @olegbespalov @mstoykov 🙇🏻

joanlopez avatar Jun 03 '24 15:06 joanlopez