go icon indicating copy to clipboard operation
go copied to clipboard

runtime: mayMoreStackPreempt failures

Open rsc opened this issue 3 years ago • 61 comments

#!watchflakes
default <- section == "maymorestack=mayMoreStackPreempt" || pkg == "runtime:mayMoreStackPreempt"

Intermittent failures in the new mayMoreStackPreempt. This bug will replace #54778 and #55073.

rsc avatar Sep 20 '22 02:09 rsc

Found new matching flaky dashboard failures.

2022-09-02 19:08 linux-386-longtest go@dbf442b1 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.12s)
    testenv.go:468: [/workdir/tmp/go-build505441303/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-02 19:08 linux-386-longtest go@b91e3737 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.07s)
    testenv.go:468: [/workdir/tmp/go-build4244626990/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-02 19:09 linux-386-longtest go@55ca6a20 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.47s)
    testenv.go:468: [/workdir/tmp/go-build771899026/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-02 19:22 linux-386-longtest go@0fda8b19 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.94s)
    testenv.go:468: [/workdir/tmp/go-build428612025/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-02 19:22 linux-386-longtest go@0fda8b19 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.76s)
    testenv.go:468: [/workdir/tmp/go-build428612025/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-03 15:45 linux-386-longtest go@f798dc68 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.90s)
    testenv.go:468: [/workdir/tmp/go-build1281399444/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-03 18:21 linux-386-longtest go@a0f05823 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.46s)
    testenv.go:468: [/workdir/tmp/go-build2488815781/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-03 18:21 linux-386-longtest go@a0f05823 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.50s)
    testenv.go:468: [/workdir/tmp/go-build2488815781/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-04 04:17 linux-386-longtest go@535fe2b2 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.50s)
    testenv.go:468: [/workdir/tmp/go-build3332541852/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 07:14 linux-386-longtest go@3fbcf05d runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.14s)
    testenv.go:468: [/workdir/tmp/go-build2637521369/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 07:17 linux-386-longtest go@4e7e7ae1 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.21s)
    testenv.go:468: [/workdir/tmp/go-build436015263/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 08:08 linux-386-longtest go@4ad55cd9 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.91s)
    testenv.go:468: [/workdir/tmp/go-build596332148/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 08:08 linux-386-longtest go@af7f4176 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.57s)
    testenv.go:468: [/workdir/tmp/go-build2390128426/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 08:28 linux-386-longtest go@bd5595d7 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.36s)
    testenv.go:468: [/workdir/tmp/go-build202959485/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-05 08:28 linux-386-longtest go@bd5595d7 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (1.14s)
    testenv.go:468: [/workdir/tmp/go-build202959485/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-05 21:39 linux-386-longtest go@4c1ca42a runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.43s)
    testenv.go:468: [/workdir/tmp/go-build2185756284/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-06 14:44 linux-386-longtest go@a60a3dc5 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.37s)
    testenv.go:468: [/workdir/tmp/go-build1980997514/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-06 15:44 linux-386-longtest go@32f68b5a runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.11s)
    testenv.go:468: [/workdir/tmp/go-build1672150549/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-06 15:44 linux-386-longtest go@32f68b5a runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.40s)
    testenv.go:468: [/workdir/tmp/go-build1672150549/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-06 15:48 linux-386-longtest go@07b19bf5 runtime.TestVDSO (log)
--- FAIL: TestVDSO (0.15s)
    testenv.go:468: [/workdir/tmp/go-build2315874385/testprog.exe SignalInVDSO] exit status: signal: segmentation fault (core dumped)
    crash_test.go:144: output:


        wanted:
        success
2022-09-02 19:08 linux-amd64-longtest go@dbf442b1 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.87s)
    testenv.go:468: [/workdir/tmp/go-build2891867005/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-02 19:08 linux-amd64-longtest go@dbf442b1 runtime.TestCgoCCodeSIGPROF (log)
--- FAIL: TestCgoCCodeSIGPROF (0.32s)
    testenv.go:468: [/workdir/tmp/go-build2891867005/testprogcgo.exe CgoCCodeSIGPROF] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:215: expected "OK\n" got 
2022-09-02 19:08 linux-amd64-longtest go@b91e3737 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (3.24s)
    testenv.go:468: [/workdir/tmp/go-build687456613/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-03 15:45 linux-amd64-longtest go@f798dc68 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (2.81s)
    testenv.go:468: [/workdir/tmp/go-build4078786360/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-03 18:21 linux-amd64-longtest go@a0f05823 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.97s)
    testenv.go:468: [/workdir/tmp/go-build956381195/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-04 04:17 linux-amd64-longtest go@535fe2b2 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (3.27s)
    testenv.go:468: [/workdir/tmp/go-build2525853216/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-05 08:12 linux-amd64-longtest go@67e65424 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.42s)
    testenv.go:468: [/workdir/tmp/go-build3155816942/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-05 08:12 linux-amd64-longtest go@67e65424 runtime.TestCgoCCodeSIGPROF (log)
--- FAIL: TestCgoCCodeSIGPROF (0.18s)
    testenv.go:468: [/workdir/tmp/go-build3155816942/testprogcgo.exe CgoCCodeSIGPROF] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:215: expected "OK\n" got 
2022-09-05 21:39 linux-amd64-longtest go@4c1ca42a runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.67s)
    testenv.go:468: [/workdir/tmp/go-build1237047364/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
2022-09-06 11:14 linux-amd64-longtest go@1c504843 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (2.60s)
    testenv.go:468: [/workdir/tmp/go-build4096451727/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 

watchflakes

gopherbot avatar Sep 20 '22 02:09 gopherbot

These were coming in multiple times per day and have now stopped for almost two weeks. Closing.

rsc avatar Sep 20 '22 02:09 rsc

Found new matching flaky dashboard failures for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2022-08-23 03:09 windows-amd64-longtest go@0a52d806 (log)
XXXBANNERXXX:Test execution environment.
# GOARCH: amd64
# CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
# GOOS: windows
# OS Version: 10.0.14393
fatal error: advapi32.dll not found
runtime: panic before malloc heap initialized
go: error obtaining buildID for go tool compile: exit status 0xc0000005

go tool dist: FAILED: go list -f={{if .Stale}}	STALE {{.ImportPath}}: {{.StaleReason}}{{end}} std: exit status 1

watchflakes

gopherbot avatar Sep 20 '22 03:09 gopherbot

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2022-09-06 15:48 linux-amd64-longtest go@07b19bf5 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.58s)
    testenv.go:468: [/workdir/tmp/go-build3380638690/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 

watchflakes

gopherbot avatar Sep 21 '22 19:09 gopherbot

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2022-10-31 21:00 linux-386-longtest go@ec0b5402 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    testenv.go:454: context deadline exceeded: terminating command: /workdir/tmp/go-build82099082/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build82099082/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffb85348
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-02 18:19 linux-386-longtest go@07a70bca runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build3697616094/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build3697616094/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0xfffffff5
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffb80458
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 15:17 linux-386-longtest go@1bfb51f8 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build368093199/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build368093199/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffbb95b4
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 15:30 linux-386-longtest go@e81263c7 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build1088517119/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build1088517119/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x1
        ebx    0x9c80310
        ecx    0x81
        edx    0x1
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffdec718
        eip    0x80b7eff
        eflags 0x292
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 17:01 linux-386-longtest go@667c53e1 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build1853768809/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build1853768809/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffcd9648
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 17:40 linux-386-longtest go@7abc8a2e runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build3148955378/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build3148955378/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffa0a878
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 18:33 linux-386-longtest go@44cabb80 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build2865322188/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build2865322188/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7efd m=0 sigcode=0

        eax    0xf0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xfff0a0e8
        eip    0x80b7efd
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 19:34 linux-386-longtest go@3511c822 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build1678369203/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build1678369203/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7eff m=0 sigcode=0

        eax    0x0
        ebx    0x8296d10
        ecx    0x80
        edx    0x0
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xffac9090
        eip    0x80b7eff
        eflags 0x286
        cs     0x23
        fs     0x0
        gs     0x63
2022-11-03 19:59 linux-386-longtest go@d031e9e0 runtime.TestCgoTracebackContextPreemption (log)
--- FAIL: TestCgoTracebackContextPreemption (60.06s)
    exec.go:152: context deadline exceeded: terminating command: /workdir/tmp/go-build3827993796/testprogcgo.exe TracebackContextPreemption
    crash_test.go:59: /workdir/tmp/go-build3827993796/testprogcgo.exe TracebackContextPreemption: exit status 2
    crash_cgo_test.go:284: expected "OK\n" got SIGQUIT: quit
        PC=0x80b7d6f m=0 sigcode=0

        eax    0x1
        ebx    0x9846e50
        ecx    0x81
        edx    0x1
        edi    0x0
        esi    0x0
        ebp    0x0
        esp    0xff9e7c18
        eip    0x80b7d6f
        eflags 0x292
        cs     0x23
        fs     0x0
        gs     0x63

watchflakes

gopherbot avatar Nov 04 '22 17:11 gopherbot

No more failure for a while. Probably fixed by CL https://golang.org/cl/447495 , which is submitted at Nov. 4, so the time seems to match.

cherrymui avatar Jan 13 '23 23:01 cherrymui

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-01-25 16:38 linux-arm64-longtest go@1d3088ef runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m39s)

runtime.gopark(0x4362a0?, 0xffff64c08d00?, 0x2?, 0x1b?, 0x5?)
	/tmp/workdir/go/src/runtime/proc.go:381 +0x124 fp=0x4000bbfcb0 sp=0x4000bbfc90 pc=0x53e84
runtime.netpollblock(0xffff64c08ce8?, 0x72?, 0x0?)
	/tmp/workdir/go/src/runtime/netpoll.go:527 +0x190 fp=0x4000bbfcf0 sp=0x4000bbfcb0 pc=0x4b9b0
internal/poll.runtime_pollWait(0xffff64c08ce8, 0x72)
	/tmp/workdir/go/src/runtime/netpoll.go:306 +0xd0 fp=0x4000bbfd20 sp=0x4000bbfcf0 pc=0x8da60
internal/poll.(*pollDesc).wait(0x4a1470?, 0x6a05c0?, 0x1)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x4000bbfd50 sp=0x4000bbfd20 pc=0xc0108
internal/poll.(*pollDesc).waitRead(...)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x4000981f80, {0x4000bbfea7, 0x1, 0x1})
	/tmp/workdir/go/src/internal/poll/fd_unix.go:167 +0x200 fp=0x4000bbfdf0 sp=0x4000bbfd50 pc=0xc0df0
os.(*File).read(...)
	/tmp/workdir/go/src/os/file_posix.go:31
os.(*File).Read(0x40026ae180, {0x4000bbfea7?, 0x4000bbff30?, 0x1?})
	/tmp/workdir/go/src/os/file.go:118 +0x58 fp=0x4000bbfe50 sp=0x4000bbfdf0 pc=0xc55c8
runtime_test.TestCrashDumpsAllThreads(0x4000e0fba0)
	/tmp/workdir/go/src/runtime/crash_unix_test.go:117 +0x4ac fp=0x4000bbff60 sp=0x4000bbfe50 pc=0x2c235c
testing.tRunner(0x4000e0fba0, 0x437308)
2023-02-01 21:30 linux-amd64-longtest go@cda461bb runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m41s)

runtime.gopark(0x83f3d8?, 0x7f22a44dab20?, 0x2?, 0x1b?, 0x5?)
	/workdir/go/src/runtime/proc.go:381 +0x110 fp=0xc000a17cf8 sp=0xc000a17cd8 pc=0x440f30
runtime.netpollblock(0x7f22a44dab08?, 0x72?, 0x0?)
	/workdir/go/src/runtime/netpoll.go:527 +0x117 fp=0xc000a17d30 sp=0xc000a17cf8 pc=0x4391d7
internal/poll.runtime_pollWait(0x7f22a44dab08, 0x72)
	/workdir/go/src/runtime/netpoll.go:306 +0xa5 fp=0xc000a17d50 sp=0xc000a17d30 pc=0x4778a5
internal/poll.(*pollDesc).wait(0x8a7c60?, 0xaa75c8?, 0x1)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000a17d78 sp=0xc000a17d50 pc=0x4a8c67
internal/poll.(*pollDesc).waitRead(...)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0007b96e0, {0xc000a17eaf, 0x1, 0x1})
	/workdir/go/src/internal/poll/fd_unix.go:167 +0x27a fp=0xc000a17e10 sp=0xc000a17d78 pc=0x4a997a
os.(*File).read(...)
	/workdir/go/src/os/file_posix.go:31
os.(*File).Read(0xc00238e190, {0xc000a17eaf?, 0xc000985f38?, 0x1?})
	/workdir/go/src/os/file.go:118 +0x4c fp=0xc000a17e68 sp=0xc000a17e10 pc=0x4ae5cc
runtime_test.TestCrashDumpsAllThreads(0xc000b17a00)
	/workdir/go/src/runtime/crash_unix_test.go:117 +0x579 fp=0xc000a17f70 sp=0xc000a17e68 pc=0x6d0ef9
testing.tRunner(0xc000b17a00, 0x8404c8)

watchflakes

gopherbot avatar Feb 02 '23 19:02 gopherbot

cc @aclements

These timeouts blocked release branch CLs as flakes; we should get them fixed or skipped.

prattmic avatar Feb 14 '23 17:02 prattmic

This reproduces pretty quickly on my linux/amd64 laptop with:

cd runtime
go test -c
GOFLAGS="-gcflags=runtime=-d=maymorestack=runtime.mayMoreStackPreempt -gcflags=runtime/testdata/...=" stress2 ./runtime.test -test.run=CrashDumpsAllThreads -test.timeout=30m

Of course, my usual first step would be to send it a SIGQUIT and see what's happening, but that's exactly what fails in this test. 😛 I haven't debugged yet. gdb may hold answers.

For reference, I initially tried running the testprog directly with the below script before I realized there are various pipes and signals involved in running this test. We might still need to reproduce those, but I'm hoping we can debug this through the test driver.

cd runtime/testdata/testprog
GOFLAGS="-gcflags=runtime=-d=maymorestack=runtime.mayMoreStackPreempt -gcflags=runtime/testdata/...=" go build
# Expects to write to FD 3 and then get a SIGQUIT
GOTRACEBACK=crash GOGC=off GODEBUG=asyncpreemptoff=1 ./testprog CrashDumpsAllThreads

aclements avatar Feb 15 '23 14:02 aclements

All of the Ms are just parked in mPark in normal places in the scheduler, except one thread that's in the infinite loop in crashDumpsAllThreadsLoop. /proc/PID/status and /proc/PID/task/*/status says there are no signals pending, blocked, or ignored by the process or any of its threads. It's a little suspicious that the blocked mask is 0, and in another Go process I have running it's fffffffc3bfa3a00.

aclements avatar Feb 15 '23 16:02 aclements

Ah hah, it's getting stuck before the SIGQUIT is involved. It's actually getting stuck in this loop, before the subprocess even indicates to the parent that it's ready for the SIGQUIT.

aclements avatar Feb 15 '23 17:02 aclements

Amusingly, this means you can send the subprocess a SIGQUIT. I've pasted the interesting part of the traceback below. All other goroutines and threads are uninteresting (this is running with GOTRACEBACK=crash so we see all threads). The test sets GOMAXPROCS to 5 and starts 4 infinite loops with async preemption disabled, and this looks to me like we're failing to steal goroutine 1 onto another P.

SIGQUIT: quit
PC=0x4b615d m=0 sigcode=0

goroutine 33 [running]:
main.crashDumpsAllThreadsLoop(0x0?, 0x0?)
	/home/austin/go.dev/src/runtime/testdata/testprog/crashdump.go:56 +0x1d fp=0xc0001127c0 sp=0xc0001127a8 pc=0x4b615d
main.CrashDumpsAllThreads.func1()
	/home/austin/go.dev/src/runtime/testdata/testprog/crashdump.go:29 +0x25 fp=0xc0001127e0 sp=0xc0001127c0 pc=0x4b6105
runtime.goexit()
	/home/austin/go.dev/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001127e8 sp=0xc0001127e0 pc=0x467441
created by main.CrashDumpsAllThreads
	/home/austin/go.dev/src/runtime/testdata/testprog/crashdump.go:29 +0x70

goroutine 1 [runnable]:
runtime.makechan(0x4ca180?, 0x0?)
	/home/austin/go.dev/src/runtime/chan.go:72 +0x1df fp=0xc0000bce58 sp=0xc0000bce50 pc=0x4050bf
main.CrashDumpsAllThreads()
	/home/austin/go.dev/src/runtime/testdata/testprog/crashdump.go:28 +0xe9 fp=0xc0000bcee8 sp=0xc0000bce58 pc=0x4b5f49
main.main()
	/home/austin/go.dev/src/runtime/testdata/testprog/main.go:34 +0x133 fp=0xc0000bcf80 sp=0xc0000bcee8 pc=0x4baeb3
runtime.main()
	/home/austin/go.dev/src/runtime/proc.go:250 +0x207 fp=0xc0000bcfe0 sp=0xc0000bcf80 pc=0x439487
runtime.goexit()
	/home/austin/go.dev/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000bcfe8 sp=0xc0000bcfe0 pc=0x467441

aclements avatar Feb 15 '23 17:02 aclements

Faster repro. Apply the following diff

--- a/src/runtime/testdata/testprog/crashdump.go
+++ b/src/runtime/testdata/testprog/crashdump.go
@@ -29,6 +29,8 @@ func CrashDumpsAllThreads() {
                <-c
        }
 
+       return
+
        // Tell our parent that all the goroutines are executing.
        if _, err := os.NewFile(3, "pipe").WriteString("x"); err != nil {
                fmt.Fprintf(os.Stderr, "write to pipe failed: %v\n", err)

Then

cd runtime/testdata/testprog
GOFLAGS="-gcflags=runtime=-d=maymorestack=runtime.mayMoreStackPreempt -gcflags=runtime/testdata/...=" go build
GOTRACEBACK=crash GOGC=off GODEBUG=asyncpreemptoff=1 stress2 ./testprog CrashDumpsAllThreads

aclements avatar Feb 15 '23 18:02 aclements

Here's a version that works without a patch:

cd runtime/testdata/testprog
GOFLAGS="-gcflags=runtime=-d=maymorestack=runtime.mayMoreStackPreempt -gcflags=runtime/testdata/...=" go build
GOTRACEBACK=crash GOGC=off GODEBUG=asyncpreemptoff=1 stress2 -timeouts-fail -max-fails=1 -timeout=10s -pass="write to pipe" -max-runs=50000 ./testprog CrashDumpsAllThreads

aclements avatar Feb 15 '23 18:02 aclements

This isn't new, either. I reproduced in Go 1.18, which was the first release to have mayMoreStack.

aclements avatar Feb 15 '23 19:02 aclements

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-02-23 06:07 linux-arm64-longtest go@71c02bed runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m40s)

runtime.gopark(0x819328?, 0xffff943a0f20?, 0x2?, 0x1b?, 0x5?)
	/tmp/workdir/go/src/runtime/proc.go:381 +0x120 fp=0x40002a0cb0 sp=0x40002a0c90 pc=0x442f50
runtime.netpollblock(0xffff943a0f08?, 0x72?, 0x0?)
	/tmp/workdir/go/src/runtime/netpoll.go:527 +0x190 fp=0x40002a0cf0 sp=0x40002a0cb0 pc=0x43a960
internal/poll.runtime_pollWait(0xffff943a0f08, 0x72)
	/tmp/workdir/go/src/runtime/netpoll.go:306 +0xd0 fp=0x40002a0d20 sp=0x40002a0cf0 pc=0x47c900
internal/poll.(*pollDesc).wait(0x8830a0?, 0xa997b8?, 0x1)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x40002a0d50 sp=0x40002a0d20 pc=0x4af8a8
internal/poll.(*pollDesc).waitRead(...)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x40007ab320, {0x40002a0ea7, 0x1, 0x1})
	/tmp/workdir/go/src/internal/poll/fd_unix.go:167 +0x200 fp=0x40002a0df0 sp=0x40002a0d50 pc=0x4b0590
os.(*File).read(...)
	/tmp/workdir/go/src/os/file_posix.go:31
os.(*File).Read(0x400141e238, {0x40002a0ea7?, 0x40002a0f30?, 0x1?})
	/tmp/workdir/go/src/os/file.go:118 +0x58 fp=0x40002a0e50 sp=0x40002a0df0 pc=0x4b4f28
runtime_test.TestCrashDumpsAllThreads(0x400016e340)
	/tmp/workdir/go/src/runtime/crash_unix_test.go:117 +0x4ac fp=0x40002a0f60 sp=0x40002a0e50 pc=0x6afbbc
testing.tRunner(0x400016e340, 0x81a410)

watchflakes

gopherbot avatar Feb 24 '23 16:02 gopherbot

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-03-13 17:03 linux-arm64-longtest go@e671fe0c runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m40s)

runtime.gopark(0x438198?, 0xffff5c2e4e30?, 0x2?, 0x1b?, 0x5?)
	/tmp/workdir/go/src/runtime/proc.go:381 +0x110 fp=0x4000d36cd0 sp=0x4000d36cb0 pc=0x52a00
runtime.netpollblock(0xffff5c2e4e18?, 0x72?, 0x0?)
	/tmp/workdir/go/src/runtime/netpoll.go:527 +0x190 fp=0x4000d36d10 sp=0x4000d36cd0 pc=0x4a770
internal/poll.runtime_pollWait(0xffff5c2e4e18, 0x72)
	/tmp/workdir/go/src/runtime/netpoll.go:306 +0xd0 fp=0x4000d36d40 sp=0x4000d36d10 pc=0x8cc70
internal/poll.(*pollDesc).wait(0x4a2c90?, 0x6a05b8?, 0x1)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x4000d36d70 sp=0x4000d36d40 pc=0xbfff8
internal/poll.(*pollDesc).waitRead(...)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x40006d2360, {0x4000d36ea7, 0x1, 0x1})
	/tmp/workdir/go/src/internal/poll/fd_unix.go:167 +0x200 fp=0x4000d36e10 sp=0x4000d36d70 pc=0xc0ce0
os.(*File).read(...)
	/tmp/workdir/go/src/os/file_posix.go:31
os.(*File).Read(0x4000184278, {0x4000d36ea7?, 0x4000d36f30?, 0x1?})
	/tmp/workdir/go/src/os/file.go:118 +0x70 fp=0x4000d36e50 sp=0x4000d36e10 pc=0xc5680
runtime_test.TestCrashDumpsAllThreads(0x4000b1b380)
	/tmp/workdir/go/src/runtime/crash_unix_test.go:117 +0x478 fp=0x4000d36f60 sp=0x4000d36e50 pc=0x2bed08
testing.tRunner(0x4000b1b380, 0x4392a8)

watchflakes

gopherbot avatar Mar 13 '23 18:03 gopherbot

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-04-03 18:58 linux-386-longtest go@4ac638f4 runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m37s)

runtime.gopark(0x84684fc, 0xee5b8eac, 0x2, 0x1b, 0x5)
	/workdir/go/src/runtime/proc.go:392 +0x111 fp=0xa3e5e48 sp=0xa3e5e34 pc=0x8089861
runtime.netpollblock(0xee5b8ea0, 0x72, 0x0)
	/workdir/go/src/runtime/netpoll.go:527 +0xef fp=0xa3e5e60 sp=0xa3e5e48 pc=0x808193f
internal/poll.runtime_pollWait(0xee5b8ea0, 0x72)
	/workdir/go/src/runtime/netpoll.go:306 +0x57 fp=0xa3e5e74 sp=0xa3e5e60 pc=0x80c0207
internal/poll.(*pollDesc).wait(0xa12ab14, 0x72, 0x1)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x37 fp=0xa3e5e88 sp=0xa3e5e74 pc=0x80f0367
internal/poll.(*pollDesc).waitRead(...)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xa12ab00, {0xa3e5f27, 0x1, 0x1})
	/workdir/go/src/internal/poll/fd_unix.go:167 +0x216 fp=0xa3e5ed8 sp=0xa3e5e88 pc=0x80f0e76
os.(*File).read(...)
	/workdir/go/src/os/file_posix.go:31
os.(*File).Read(0x9d822b8, {0xa3e5f27, 0x1, 0x1})
	/workdir/go/src/os/file.go:118 +0x70 fp=0xa3e5efc sp=0xa3e5ed8 pc=0x80f56b0
runtime_test.TestCrashDumpsAllThreads(0x9f84780)
	/workdir/go/src/runtime/crash_unix_test.go:117 +0x609 fp=0xa3e5f98 sp=0xa3e5efc pc=0x8318b29
testing.tRunner(0x9f84780, 0x8468db8)

watchflakes

gopherbot avatar Apr 03 '23 21:04 gopherbot

@golang/runtime, should TestCrashDumpsAllThreads be skipped in the mayMoreStackPreempt configuration, or is there some other action in progress for this failure?

bcmills avatar Apr 04 '23 14:04 bcmills

https://storage.googleapis.com/go-build-log/7a497521/linux-amd64-longtest_4c66413d.log (a SlowBot run)

bcmills avatar Apr 12 '23 15:04 bcmills

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-05-02 12:54 linux-386-longtest go@a9a01ea2 runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m36s)

runtime.gopark(0x8477b7c, 0xed2c6d20, 0x2, 0x1b, 0x5)
	/workdir/go/src/runtime/proc.go:398 +0x111 fp=0x9e51e48 sp=0x9e51e34 pc=0x808a681
runtime.netpollblock(0xed2c6d10, 0x72, 0x0)
	/workdir/go/src/runtime/netpoll.go:556 +0xef fp=0x9e51e60 sp=0x9e51e48 pc=0x80825cf
internal/poll.runtime_pollWait(0xed2c6d10, 0x72)
	/workdir/go/src/runtime/netpoll.go:335 +0x57 fp=0x9e51e74 sp=0x9e51e60 pc=0x80c1497
internal/poll.(*pollDesc).wait(0x9ed6498, 0x72, 0x1)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x37 fp=0x9e51e88 sp=0x9e51e74 pc=0x80f1457
internal/poll.(*pollDesc).waitRead(...)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x9ed6480, {0x9e51f27, 0x1, 0x1})
	/workdir/go/src/internal/poll/fd_unix.go:167 +0x216 fp=0x9e51ed8 sp=0x9e51e88 pc=0x80f1f66
os.(*File).read(...)
	/workdir/go/src/os/file_posix.go:31
os.(*File).Read(0x9c2a690, {0x9e51f27, 0x1, 0x1})
	/workdir/go/src/os/file.go:118 +0x70 fp=0x9e51efc sp=0x9e51ed8 pc=0x80f6600
runtime_test.TestCrashDumpsAllThreads(0x9e88c30)
	/workdir/go/src/runtime/crash_unix_test.go:117 +0x609 fp=0x9e51f98 sp=0x9e51efc pc=0x8326029
testing.tRunner(0x9e88c30, 0x8477338)

watchflakes

gopherbot avatar May 02 '23 14:05 gopherbot

The cause of this flake (using Austin's reproducer above) is maymorestack instrumentation of any one (or more) of:

runtime.closechan
runtime.newobject
runtime.mallocgc
runtime.lock2
runtime.makechan
runtime.newproc
runtime.deductAssistCredit

beyond that I don't know. I found these using compiler hash debugging and gossahash (both of these are being renovated/enhanced/replaced in coming weeks).

One theory is that this is the full set of runtime functions called hard by that test, so it is just a general problem.

dr2chase avatar May 03 '23 20:05 dr2chase

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-05-11 16:11 linux-386-longtest go@2e7d864f runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m38s)

runtime.gopark(0x84855e0, 0xee670bf0, 0x2, 0x1b, 0x5)
	/workdir/go/src/runtime/proc.go:387 +0x111 fp=0xa858e48 sp=0xa858e34 pc=0x808b591
runtime.netpollblock(0xee670be0, 0x72, 0x0)
	/workdir/go/src/runtime/netpoll.go:556 +0xef fp=0xa858e60 sp=0xa858e48 pc=0x808350f
internal/poll.runtime_pollWait(0xee670be0, 0x72)
	/workdir/go/src/runtime/netpoll.go:335 +0x57 fp=0xa858e74 sp=0xa858e60 pc=0x80c1f27
internal/poll.(*pollDesc).wait(0xae25ed8, 0x72, 0x1)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x37 fp=0xa858e88 sp=0xa858e74 pc=0x80f2497
internal/poll.(*pollDesc).waitRead(...)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xae25ec0, {0xa858f27, 0x1, 0x1})
	/workdir/go/src/internal/poll/fd_unix.go:167 +0x216 fp=0xa858ed8 sp=0xa858e88 pc=0x80f2fa6
os.(*File).read(...)
	/workdir/go/src/os/file_posix.go:31
os.(*File).Read(0xbe382f0, {0xa858f27, 0x1, 0x1})
	/workdir/go/src/os/file.go:118 +0x70 fp=0xa858efc sp=0xa858ed8 pc=0x80f76a0
runtime_test.TestCrashDumpsAllThreads(0xad304b0)
	/workdir/go/src/runtime/crash_unix_test.go:117 +0x609 fp=0xa858f98 sp=0xa858efc pc=0x832ad89
testing.tRunner(0xad304b0, 0x8484d98)

watchflakes

gopherbot avatar May 11 '23 17:05 gopherbot

Found new dashboard test flakes for:

    #!watchflakes
    post <- section == "maymorestack=mayMoreStackPreempt"
2023-05-23 17:30 linux-386-longtest go@f96e1933 runtime.TestCrashDumpsAllThreads (log)
panic: test timed out after 50m0s
running tests:
	TestCrashDumpsAllThreads (49m43s)

runtime.gopark(0x8496ac8, 0xee65ac48, 0x2, 0x1b, 0x5)
	/workdir/go/src/runtime/proc.go:398 +0x111 fp=0xad75e48 sp=0xad75e34 pc=0x808c671
runtime.netpollblock(0xee65ac38, 0x72, 0x0)
	/workdir/go/src/runtime/netpoll.go:564 +0xef fp=0xad75e60 sp=0xad75e48 pc=0x808384f
internal/poll.runtime_pollWait(0xee65ac38, 0x72)
	/workdir/go/src/runtime/netpoll.go:343 +0x57 fp=0xad75e74 sp=0xad75e60 pc=0x80c32e7
internal/poll.(*pollDesc).wait(0xa700298, 0x72, 0x1)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:84 +0x37 fp=0xad75e88 sp=0xad75e74 pc=0x80f3e97
internal/poll.(*pollDesc).waitRead(...)
	/workdir/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xa700280, {0xad75f27, 0x1, 0x1})
	/workdir/go/src/internal/poll/fd_unix.go:167 +0x216 fp=0xad75ed8 sp=0xad75e88 pc=0x80f49a6
os.(*File).read(...)
	/workdir/go/src/os/file_posix.go:29
os.(*File).Read(0xa42a048, {0xad75f27, 0x1, 0x1})
	/workdir/go/src/os/file.go:118 +0x70 fp=0xad75efc sp=0xad75ed8 pc=0x80f90b0
runtime_test.TestCrashDumpsAllThreads(0xa93a000)
	/workdir/go/src/runtime/crash_unix_test.go:117 +0x609 fp=0xad75f98 sp=0xad75efc pc=0x8337169
testing.tRunner(0xa93a000, 0x849620c)

watchflakes

gopherbot avatar May 23 '23 18:05 gopherbot

The cause of this flake (using Austin's reproducer above) is maymorestack instrumentation of any one (or more) of

Thanks for running this. I think this rules out the "we forgot a nosplit somewhere" hypothesis. Unfortunately, it leaves us with "there's just a bug in the scheduler". But since it seems to only be affecting TestCrashDumpsAllThreads, which is itself testing a debug mode, I don't think this is high priority. I'll send a skip to quiet this down.

aclements avatar Jun 06 '23 17:06 aclements

Change https://go.dev/cl/501229 mentions this issue: runtime: skip TestCrashDumpsAllThreads with mayMoreStackPreempt

gopherbot avatar Jun 06 '23 17:06 gopherbot

I believe this is a bug in https://go.dev/cl/7314062. (Yes, a 10 year old bug).

The "delicate dance" when a spinning M transitions to non-spinning is supposed to check all sources of work, but it doesn't check the global run queue: https://cs.opensource.google/go/go/+/master:src/runtime/proc.go;l=3084;drc=5b6e6d2b3d4d6877c86471209e249f1b6e36d0ca

gosched / mayMoreStack put the preempted goroutine on the global run queue, increasing the chance of failure (in addition to the extra stress it puts on the scheduler in general). Still, I am surprised this bug could go unnoticed for so long.

With that issue fixed, I can no longer reproduce this hang.

@gopherbot Please open backports to 1.20 and 1.19. This is a hang in the scheduler with no work around.

prattmic avatar Jun 06 '23 22:06 prattmic

Backport issue(s) opened: #60643 (for 1.19), #60644 (for 1.20).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

gopherbot avatar Jun 06 '23 22:06 gopherbot

There is actually a second, similar issue: gosched (runtime.Gosched as well as morestack and async preemption) don't call wakep. They are submitting runnable work to the scheduler, so they must follow the same protocols for synchronizing with spinning Ms as any other source of work.

prattmic avatar Jun 08 '23 19:06 prattmic

Change https://go.dev/cl/501976 mentions this issue: runtime: call wakep in gosched

gopherbot avatar Jun 08 '23 19:06 gopherbot