actionlint
actionlint copied to clipboard
panic: WaitGroup is reused before previous wait as returned
We hit a panic in CI. It seems spurious, as re-running the workflow did not encounter the error:
# shellcheck disable=SC2016
actionlint \
-format '{{range $err := .}}::error file={{$err.Filepath}},line={{$err.Line}},col={{$err.Column}}::{{$err.Message}}%0A```%0A{{replace $err.Snippet "\\n" "%0A"}}%0A```\n{{end}}' \
.github/workflows/*
2022-07-27T07:32:11.2813048Z panic: sync: WaitGroup is reused before previous Wait has returned
2022-07-27T07:32:11.2814721Z
2022-07-27T07:32:11.2815422Z goroutine 1 [running]:
2022-07-27T07:32:11.2815893Z sync.(*WaitGroup).Wait(0xc00009cd80?)
2022-07-27T07:32:11.2816229Z /opt/hostedtoolcache/go/1.18.3/x64/src/sync/waitgroup.go:138 +0x85
2022-07-27T07:32:11.2816611Z github.com/rhysd/actionlint.(*concurrentProcess).wait(...)
2022-07-27T07:32:11.2817037Z /home/runner/work/actionlint/actionlint/process.go:79
2022-07-27T07:32:11.2820123Z github.com/rhysd/actionlint.(*Linter).LintFiles(0xc0000d4100, {0xc00009e030, 0xf, 0xc0000a8270?}, 0x0)
2022-07-27T07:32:11.2820576Z /home/runner/work/actionlint/actionlint/linter.go:329 +0x685
2022-07-27T07:32:11.2820989Z github.com/rhysd/actionlint.(*Command).runLinter(0xc000105c20, {0xc00009e030?, 0xf, 0xf}, 0xc000106e10, 0x0)
2022-07-27T07:32:11.2821381Z /home/runner/work/actionlint/actionlint/command.go:107 +0x17d
2022-07-27T07:32:11.2822032Z github.com/rhysd/actionlint.(*Command).Main(0xc000105c20, {0xc00009e000, 0x12, 0x12})
2022-07-27T07:32:11.2822411Z /home/runner/work/actionlint/actionlint/command.go:181 +0x6d7
2022-07-27T07:32:11.2822695Z main.main()
2022-07-27T07:32:11.2822990Z /home/runner/work/actionlint/actionlint/cmd/actionlint/main.go:15 +0xd6
If the CI is public, can you share the URL of job run? I want to know the frequency also.
@rhysd I've only seen the failure once and it succeeded on retry (i.e. without making any changes to the branch) https://github.com/linkerd/linkerd2/runs/7535455694?check_suite_focus=true#step:4:23
Thanks. There seems some races. I'll take a look.
In our private CI with over 80 workflow files, the failure rate due to this error is about 3%~5%. We run CI about a thousand times in a week and it fails with this error about forty or fifty times.