chromedp
chromedp copied to clipboard
allocator gets stuck randomly
We've been using chromedp successfully on a variety of platforms and devices for almost a year now. It works wonderfully.
Recently we've noticed rare instances where chromedp gets stuck trying to launch chrome for some reason. Here's one example:
goroutine 41317 [select, 13499 minutes]:
github.com/chromedp/chromedp.(*ExecAllocator).Allocate(0xc001a56050, 0x54347a0, 0xc0011847e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
github.com/chromedp/[email protected]/allocate.go:189 +0x96b
github.com/chromedp/chromedp.Run(0x54347a0, 0xc0011847e0, 0xc0008e22f0, 0x2, 0x2, 0xc000ac25a0, 0x0)
github.com/chromedp/[email protected]/chromedp.go:217 +0x209
goroutine 41311 [chan receive, 13499 minutes]:
github.com/chromedp/chromedp.(*ExecAllocator).Allocate.func2(0x54347a0, 0xc0011847e0, 0xc001852420, 0xc001852401, 0xc000044af0, 0x49, 0xc0013ce160, 0xc001a56050)
github.com/chromedp/[email protected]/allocate.go:199 +0x4c
created by github.com/chromedp/chromedp.(*ExecAllocator).Allocate
github.com/chromedp/[email protected]/allocate.go:198 +0xa2e
goroutine 41309 [chan receive, 13499 minutes]:
github.com/chromedp/chromedp.NewContext.func1(0x54347a0, 0xc0011847e0, 0xc0013ce160)
github.com/chromedp/[email protected]/chromedp.go:128 +0x78
created by github.com/chromedp/chromedp.NewContext
github.com/chromedp/[email protected]/chromedp.go:127 +0x22a
This particular instance was on macOS, but we've seen the same behavior on Windows as well.
At the very least it seems like some sort of timeout would be helpful here:
https://github.com/chromedp/chromedp/blob/aa5cc8729d9451c668fec01ca0a7e7b4cf8b74b5/allocate.go#L189-L193
Hm. Have you been able to reproduce this reliably in any way?
I just found a repro on one of my devices:
goroutine 104 [select, 1 minutes]:
github.com/chromedp/chromedp.(*ExecAllocator).Allocate(0xbce7c510, 0xba8ec008, 0xbca040a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
github.com/chromedp/[email protected]/allocate.go:190 +0x790
github.com/chromedp/chromedp.Run(0xba8ec008, 0xbca040a0, 0xbcf3f3bc, 0x2, 0x2, 0x1, 0xbce74b10)
github.com/chromedp/[email protected]/chromedp.go:228 +0x188
goroutine 132 [chan receive, 1 minutes]:
github.com/chromedp/chromedp.NewContext.func1(0xba8ec008, 0xbca040a0, 0xbccf6a20)
github.com/chromedp/[email protected]/chromedp.go:130 +0x50
created by github.com/chromedp/chromedp.NewContext
github.com/chromedp/[email protected]/chromedp.go:129 +0x21c
goroutine 134 [chan receive, 1 minutes]:
github.com/chromedp/chromedp.(*ExecAllocator).Allocate.func2(0xba8ec008, 0xbca040a0, 0xbcb113f0, 0xbcb11301, 0xbc83b500, 0x53, 0xbccf6a20, 0xbce7c510)
github.com/chromedp/[email protected]/allocate.go:200 +0x38
created by github.com/chromedp/chromedp.(*ExecAllocator).Allocate
github.com/chromedp/[email protected]/allocate.go:199 +0x824
What data can I gather? I'm pretty sure the underlying chrome binary is SEGVing on launch, so perhaps pointing the allocator at a shell script that exits unexpectedly will reproduce in the tests.
Spent a ton of time trying to reproduce this today. I kept trying to add a test I expected would fail, but it wasn't failing even on the same setup.
Finally I discovered a subtle bug in my app's calling code:
err := chromedp.Run(ctx)
if err, ok := err.(*exec.Error); ok && err.Err == exec.ErrNotFound {
return fmt.Errorf("chrome was not found on your system")
} else if err != nil {
return err
}
err = chromedp.Run(ctx, ....)
Apparently the if statement reusing err overwrites the value in the !ok case, causing the else to fail and then chromedp.Run is called again and hangs forever.
The bug can be fixed in my app by renaming err in the if statement, however I think chromedp should also complain loudly if the context is reused like this? Here's a failing test for that:
diff --git a/allocate_test.go b/allocate_test.go
index 3610c0f..15b4795 100644
--- a/allocate_test.go
+++ b/allocate_test.go
@@ -104,6 +104,29 @@ func TestExecAllocatorKillBrowser(t *testing.T) {
}
}
+func TestExecAllocatorFailReuse(t *testing.T) {
+ t.Parallel()
+
+ // To ensure we don't actually fire any Chrome processes.
+ allocCtx, _ := NewExecAllocator(context.Background(),
+ ExecPath("testdata/chromecrash.sh"))
+
+ ctx, cancel := NewContext(allocCtx)
+ defer cancel()
+ if err := Run(ctx); err == nil {
+ t.Fatalf("expected err but got nil")
+ } else {
+ t.Logf("err1: %v", err)
+ }
+
+ // Try to use expired context again
+ if err := Run(ctx); err == nil {
+ t.Fatalf("expected err but got nil")
+ } else {
+ t.Logf("err2: %v", err)
+ }
+}
+
func TestSkipNewContext(t *testing.T) {
t.Parallel()
diff --git a/testdata/chromecrash.sh b/testdata/chromecrash.sh
new file mode 100755
index 0000000..7f7e33d
--- /dev/null
+++ b/testdata/chromecrash.sh
@@ -0,0 +1,2 @@
+#!/bin/sh
+exit 11
which reproduces the hang readily on my mac:
$ go test -v ./ -run TestExecAllocatorFailReuse
=== RUN TestExecAllocatorFailReuse
=== PAUSE TestExecAllocatorFailReuse
=== CONT TestExecAllocatorFailReuse
TestExecAllocatorFailReuse: allocate_test.go:119: err1: chrome failed to start:
^\SIGQUIT: quit
PC=0x7fff70108882 m=0 sigcode=0
goroutine 0 [idle]:
runtime.pthread_cond_wait(0x1a612c8, 0x1a61288, 0x0)
/usr/local/Cellar/go/1.14.3/libexec/src/runtime/sys_darwin.go:390 +0x39
runtime.semasleep(0xffffffffffffffff, 0x7ffeefbff570)
/usr/local/Cellar/go/1.14.3/libexec/src/runtime/os_darwin.go:63 +0x85
runtime.notesleep(0x1a61088)
/usr/local/Cellar/go/1.14.3/libexec/src/runtime/lock_sema.go:173 +0xe0
runtime.stopm()
/usr/local/Cellar/go/1.14.3/libexec/src/runtime/proc.go:1828 +0xc0
runtime.findrunnable(0xc000044000, 0x0)
/usr/local/Cellar/go/1.14.3/libexec/src/runtime/proc.go:2360 +0xa0d
runtime.schedule()
/usr/local/Cellar/go/1.14.3/libexec/src/runtime/proc.go:2520 +0x2fc
runtime.park_m(0xc0001a2180)
/usr/local/Cellar/go/1.14.3/libexec/src/runtime/proc.go:2690 +0x9d
runtime.mcall(0x10658c6)
/usr/local/Cellar/go/1.14.3/libexec/src/runtime/asm_amd64.s:318 +0x5b
goroutine 1 [chan receive]:
testing.tRunner.func1(0xc0000c1200)
/usr/local/Cellar/go/1.14.3/libexec/src/testing/testing.go:956 +0x238
testing.tRunner(0xc0000c1200, 0xc00006fd78)
/usr/local/Cellar/go/1.14.3/libexec/src/testing/testing.go:995 +0x10b
testing.runTests(0xc00000e6e0, 0x1a5a180, 0x5d, 0x5d, 0x0)
/usr/local/Cellar/go/1.14.3/libexec/src/testing/testing.go:1282 +0x2a7
testing.(*M).Run(0xc000190000, 0x0)
/usr/local/Cellar/go/1.14.3/libexec/src/testing/testing.go:1199 +0x15f
github.com/chromedp/chromedp.TestMain(0xc000190000)
/Users/tmm1/fancybits/chromedp/chromedp_test.go:82 +0x138
main.main()
_testmain.go:252 +0x135
goroutine 6 [select]:
github.com/chromedp/chromedp.(*ExecAllocator).Allocate(0xc0000322d0, 0x16a2200, 0xc00019e090, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/Users/tmm1/fancybits/chromedp/allocate.go:190 +0x96a
github.com/chromedp/chromedp.Run(0x16a2200, 0xc00019e090, 0x0, 0x0, 0x0, 0x1, 0xc0000738f0)
/Users/tmm1/fancybits/chromedp/chromedp.go:228 +0x209
github.com/chromedp/chromedp.TestExecAllocatorFailReuse(0xc0000c1320)
/Users/tmm1/fancybits/chromedp/allocate_test.go:123 +0x1c2
testing.tRunner(0xc0000c1320, 0x15fa428)
/usr/local/Cellar/go/1.14.3/libexec/src/testing/testing.go:991 +0xdc
created by testing.(*T).Run
/usr/local/Cellar/go/1.14.3/libexec/src/testing/testing.go:1042 +0x357
goroutine 7 [chan receive]:
testing.runTests.func1.1(0xc0000c1200)
/usr/local/Cellar/go/1.14.3/libexec/src/testing/testing.go:1289 +0x3b
created by testing.runTests.func1
/usr/local/Cellar/go/1.14.3/libexec/src/testing/testing.go:1289 +0xac
goroutine 8 [chan receive]:
github.com/chromedp/chromedp.NewContext.func1(0x16a2200, 0xc00019e090, 0xc0000b04d0)
/Users/tmm1/fancybits/chromedp/chromedp.go:130 +0x78
created by github.com/chromedp/chromedp.NewContext
/Users/tmm1/fancybits/chromedp/chromedp.go:129 +0x22a
goroutine 9 [select]:
os/exec.(*Cmd).Start.func2(0xc0000ce420)
/usr/local/Cellar/go/1.14.3/libexec/src/os/exec/exec.go:449 +0xc4
created by os/exec.(*Cmd).Start
/usr/local/Cellar/go/1.14.3/libexec/src/os/exec/exec.go:448 +0x6d0
goroutine 10 [chan receive]:
github.com/chromedp/chromedp.(*ExecAllocator).Allocate.func2(0x16a2200, 0xc00019e090, 0xc0000ce420, 0xc0000ce401, 0xc0000907e0, 0x60, 0xc0000b04d0, 0xc0000322d0)
/Users/tmm1/fancybits/chromedp/allocate.go:200 +0x4c
created by github.com/chromedp/chromedp.(*ExecAllocator).Allocate
/Users/tmm1/fancybits/chromedp/allocate.go:199 +0xa2d
goroutine 12 [select]:
os/exec.(*Cmd).Start.func2(0xc0000ce580)
/usr/local/Cellar/go/1.14.3/libexec/src/os/exec/exec.go:449 +0xc4
created by os/exec.(*Cmd).Start
/usr/local/Cellar/go/1.14.3/libexec/src/os/exec/exec.go:448 +0x6d0
rax 0x104
rbx 0x2
rcx 0x7ffeefbff398
rdx 0x200
rdi 0x1a612c8
rsi 0x30100000400
rbp 0x7ffeefbff430
rsp 0x7ffeefbff398
r8 0x0
r9 0xa0
r10 0x0
r11 0x202
r12 0x1a612c8
r13 0x16
r14 0x30100000400
r15 0x7365dc0
rip 0x7fff70108882
rflags 0x203
cs 0x7
fs 0x0
gs 0x0
FAIL github.com/chromedp/chromedp 1.787s
@mvdan Any thoughts here? I can put together a PR if we agree on approach.
Agreed, please send a PR. Two thoughts:
- Think about whether the fix would be racy; for example, in the sequence of
Run(ctx); cancel(); Run(ctx), since the cancellation work is asynchronous, we don't want it to possibly race with the secondRun. - The test shouldn't use a script, because that's not portable. What you can do is call
os.Args[0]with a special env var likeCHROMEDP_TEST_COMMAND=fail, then inTestMainyou doif os.Getenv("CHROMEDP_TEST_COMMAND") == "fail" { os.Exit(1) }. This is effectively reusing the test binary to run some Go code. It's nice because it's portable, and doesn't require linking another binary.