xk6-browser
xk6-browser copied to clipboard
Deadlock one single iteration test
Test run: 2568554, 2571988 and 2591015, 2840522.
Any idea why don't we use vu.Context()
here?
func (m *RootModule) NewModuleInstance(vu k6modules.VU) k6modules.Instance {
// ...
browserRegistry: newBrowserRegistry(
context.Background(), // why isn't this vu.Context()?
...
When I looked into this stack trace (of test run ID 2591015), it's clear that Sesssion
's and event handler's loops get stuck. The root cause is probably something else, but they still look suspicious to me.
stderr_contents
Logger format: RAW SIGQUIT: quit PC=0x40720e m=0 sigcode=0
goroutine 1 [chan receive, 38 minutes]:
runtime.gopark(0xc000bf0830?, 0x4105e5?, 0x80?, 0x44?, 0x40?)
runtime/proc.go:398 +0xce fp=0xc000bf07c8 sp=0xc000bf07a8 pc=0x43e1ee
runtime.chanrecv(0xc00117af00, 0xc000bf0a28, 0x1)
runtime/chan.go:583 +0x3cd fp=0xc000bf0840 sp=0xc000bf07c8 pc=0x409bcd
runtime.chanrecv1(0x1b97aa0?, 0xc000c3f8f0?)
runtime/chan.go:442 +0x12 fp=0xc000bf0868 sp=0xc000bf0840 pc=0x4097d2
go.k6.io/k6/execution.(*Scheduler).Run(0xc000884000, {0x1b97ad8?, 0xc0002de230}, {0x1b97aa0, 0xc0004fc6c0}, 0xc00218e360)
go.k6.io/k6/execution/scheduler.go:508 +0xd9d fp=0xc000bf0e48 sp=0xc000bf0868 pc=0xa4e7dd
go.k6.io/k6/cmd.(*cmdRun).run(0xc000120570, 0xc0004c2a00, {0xc0004d0300?, 0x1, 0x62})
go.k6.io/k6/cmd/run.go:395 +0x1671 fp=0xc000bf1c48 sp=0xc000bf0e48 pc=0x143ed91
github.com/grafana/xk6-browser/common.(*Session).readLoop(0xc00c39b0e0)
github.com/grafana/[email protected]/common/session.go:83 +0x99 fp=0xc005dfd7c8 sp=0xc005dfd6b8 pc=0x1243e19
github.com/grafana/xk6-browser/common.NewSession.func1()
github.com/grafana/[email protected]/common/session.go:47 +0x25 fp=0xc005dfd7e0 sp=0xc005dfd7c8 pc=0x1243b25
runtime.goexit()
runtime/asm_amd64.s:1650 +0x1 fp=0xc005dfd7e8 sp=0xc005dfd7e0 pc=0x471421 created by
github.com/grafana/xk6-browser/common.NewSession in
goroutine 279771
github.com/grafana/[email protected]/common/session.go:47 +0x365
goroutine 272270 [select, 6 minutes]:
runtime.gopark(0xc011aa7788?, 0x2?, 0x50?, 0x76?, 0xc011aa777c?)
runtime/proc.go:398 +0xce fp=0xc011aa7628 sp=0xc011aa7608 pc=0x43e1ee
runtime.selectgo(0xc011aa7788, 0xc011aa7778, 0xc026c116e0?, 0x0, 0xc001f00860?, 0x1)
runtime/select.go:327 +0x725 fp=0xc011aa7748 sp=0xc011aa7628 pc=0x44e625
github.com/grafana/xk6-browser/common.(*BaseEventEmitter).syncAll(0xc0021badc0, {0x1b97ad8, 0xc000d51cc0})
github.com/grafana/[email protected]/common/event_emitter.go:126 +0xa5 fp=0xc011aa77b8 sp=0xc011aa7748 pc=0x11ed0a5
github.com/grafana/xk6-browser/common.NewBaseEventEmitter.func1()
github.com/grafana/[email protected]/common/event_emitter.go:116 +0x28 fp=0xc011aa77e0 sp=0xc011aa77b8 pc=0x11ecfc8
runtime.goexit()
runtime/asm_amd64.s:1650 +0x1 fp=0xc011aa77e8 sp=0xc011aa77e0 pc=0x471421 created by
github.com/grafana/xk6-browser/common.NewBaseEventEmitter in
goroutine 271907
github.com/grafana/[email protected]/common/event_emitter.go:116 +0x199 rax 0xfffffffffffffffc rbx 0x4 rcx 0x40720e rdx 0x80 rdi 0x4 rsi 0x7ffff87436d8 rbp 0x7ffff8743650 rsp 0x7ffff8743608 r8 0x0 r9 0x0 r10 0x31 r11 0x246 r12 0x7ffff87436e0 r13 0xc00006b744 r14 0x29be0a0 r15 0x0 rip 0x40720e rflags 0x246 cs 0x33 fs 0x0 gs 0x0
Even if we use VU context, why would k6-core cancel the context is another question, though.... But it might still be better than getting nothing out of the errors. At least, we can log something when the context is canceled due to timeout, etc.
These components (session and event loop) are suspicious because there are no other goroutines left but them. There is probably some bug in them. They probably both want to receive it, but nobody sends something to the channels they listen to. Maybe we can build some circuitry in the module so that when all subcomponents finish, we can stop running these two.
Any idea why don't we use vu.Context() here?
Here's the reason why we're using a background context.
I still need to take a deep dive into your findings though to align properly. Thanks for investigating @inancgumus!
These components (session and event loop) are suspicious because there are no other goroutines left but them. There is probably some bug in them. They probably both want to receive it, but nobody sends something to the channels they listen to. Maybe we can build some circuitry in the module so that when all subcomponents finish, we can stop running these two.
Yep, I agree with the idea of building something in that can force close any remaining goroutines. Maybe, If a goroutine isn't listening to a context, it could register itself to a global goroutineRegister
that can be iterated over from when the exitEvent is received, which would call a new forceClose
method.
The forceClose
method could close a forceClosed
channel which these goroutines would need to select on. I prefer the idea of adding a new method and channel over reusing existing channels so that it's more explicit that the goroutine is being force to close, which will also help us debug the issue and find a more long term solution to the problem.
I found a stack trace which might be useful in helping to diagnose the issue or at least use a symptom to link failures to this issue:
goroutine 0 gp=0x29f8360 m=0 mp=0x29f8d40 [idle]:
runtime.futex(0x29f8e80, 0x80, 0x0, 0x0, 0x0, 0x0)
runtime/sys_linux_amd64.s:557 +0x21 fp=0x7fff3ff27120 sp=0x7fff3ff27118 pc=0x47a6a1
runtime.futexsleep(0x7fff3ff27198?, 0x410110?, 0x13ff271c8?)
runtime/os_linux.go:69 +0x30 fp=0x7fff3ff27170 sp=0x7fff3ff27120 pc=0x43a5d0
runtime.notesleep(0x29f8e80)
runtime/lock_futex.go:170 +0x87 fp=0x7fff3ff271a8 sp=0x7fff3ff27170 pc=0x4102a7
runtime.mPark(...)
runtime/proc.go:1761
runtime.stopm()
runtime/proc.go:2782 +0x8c fp=0x7fff3ff271d8 sp=0x7fff3ff271a8 pc=0x44514c
runtime.findRunnable()
runtime/proc.go:3512 +0xd5f fp=0x7fff3ff27350 sp=0x7fff3ff271d8 pc=0x446cbf
runtime.schedule()
runtime/proc.go:3868 +0xb1 fp=0x7fff3ff27388 sp=0x7fff3ff27350 pc=0x447d91
runtime.park_m(0xc00058c1c0)
runtime/proc.go:4036 +0x1ec fp=0x7fff3ff273e0 sp=0x7fff3ff27388 pc=0x44836c
runtime.mcall()
runtime/asm_amd64.s:458 +0x4e fp=0x7fff3ff273f8 sp=0x7fff3ff273e0 pc=0x4769ce
goroutine 1 gp=0xc0000061c0 m=nil [chan receive, 38 minutes]:
runtime.gopark(0x29f8d40?, 0xc0013a49a0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:402 +0xce fp=0xc0005aa990 sp=0xc0005aa970 pc=0x440b4e
runtime.chanrecv(0xc001e29d40, 0xc000efcb80, 0x1)
runtime/chan.go:583 +0x3bf fp=0xc0005aaa08 sp=0xc0005aa990 pc=0x40a71f
runtime.chanrecv1(0x1bb81d0?, 0xc0008a9230?)
runtime/chan.go:442 +0x12 fp=0xc0005aaa30 sp=0xc0005aaa08 pc=0x40a332
go.k6.io/k6/execution.(*Scheduler).Run(0xc0003f8000, {0x1bb8208, 0xc00009b400}, {0x1bb81d0, 0xc0003c2ba0}, 0xc00232b4a0)
go.k6.io/k6/execution/scheduler.go:508 +0xd3e fp=0xc0005aafc8 sp=0xc0005aaa30 pc=0xa5cd5e
go.k6.io/k6/cmd.(*cmdRun).run(0xc000050820, 0xc0001c78c8, {0xc0001f0008, 0x1, 0x32})
go.k6.io/k6/cmd/run.go:395 +0x15f1 fp=0xc0005abc50 sp=0xc0005aafc8 pc=0x1451931
go.k6.io/k6/cmd.(*cmdRun).run-fm(0xc0001e8800?, {0xc0001f0008?, 0x4?, 0x17ec151?})
<autogenerated>:1 +0x34 fp=0xc0005abc88 sp=0xc0005abc50 pc=0x145c534
github.com/spf13/cobra.(*Command).execute(0xc0001c78c8, {0xc0
... omitting 1312447 bytes ...
runtime/proc.go:402 +0xce fp=0xc009823598 sp=0xc009823578 pc=0x440b4e
runtime.selectgo(0xc009823790, 0xc009823700, 0x180b369?, 0x0, 0xc009823780?, 0x1)
runtime/select.go:327 +0x725 fp=0xc0098236b8 sp=0xc009823598 pc=0x4528e5
github.com/grafana/xk6-browser/common.(*Session).readLoop(0xc01087f040)
github.com/grafana/[email protected]/common/session.go:83 +0x99 fp=0xc0098237c8 sp=0xc0098236b8 pc=0x128dcd9
github.com/grafana/xk6-browser/common.NewSession.gowrap1()
github.com/grafana/[email protected]/common/session.go:47 +0x25 fp=0xc0098237e0 sp=0xc0098237c8 pc=0x128d9e5
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc0098237e8 sp=0xc0098237e0 pc=0x478881
created by github.com/grafana/xk6-browser/common.NewSession in goroutine 6444541
github.com/grafana/[email protected]/common/session.go:47 +0x35d
goroutine 6447058 gp=0xc00576e1c0 m=nil [select, 9 minutes]:
runtime.gopark(0xc006d78f90?, 0x2?, 0x90?, 0xf4?, 0xc006d78f04?)
runtime/proc.go:402 +0xce fp=0xc006d78d98 sp=0xc006d78d78 pc=0x440b4e
runtime.selectgo(0xc006d78f90, 0xc006d78f00, 0x0?, 0x0, 0x0?, 0x1)
runtime/select.go:327 +0x725 fp=0xc006d78eb8 sp=0xc006d78d98 pc=0x4528e5
github.com/grafana/xk6-browser/common.(*Session).readLoop(0xc0052214a0)
github.com/grafana/[email protected]/common/session.go:83 +0x99 fp=0xc006d78fc8 sp=0xc006d78eb8 pc=0x128dcd9
github.com/grafana/xk6-browser/common.NewSession.gowrap1()
github.com/grafana/[email protected]/common/session.go:47 +0x25 fp=0xc006d78fe0 sp=0xc006d78fc8 pc=0x128d9e5
runtime.goexit({})
runtime/asm_amd64.s:1695 +0x1 fp=0xc006d78fe8 sp=0xc006d78fe0 pc=0x478881
created by github.com/grafana/xk6-browser/common.NewSession in goroutine 6444541
github.com/grafana/[email protected]/common/session.go:47 +0x35d
Happened again: 2840522