actions
actions copied to clipboard
setup-r / setup-pandoc stuck on macOS (hitting 6h timeout)
Describe the bug Over the last weeks, I have often seen setup-r (and setup-pandoc, just today) getting stuck and eventually hitting the default 6h timeout. This has happened on macOS-latest, in different repos and for various R versions. From the log it is not very clear where and why the execution got stuck (especially for setup-r), and the issue is transient in that a re-run usually goes through.
To Reproduce
Examples of failures, with relevant logs including timing for convenience and for the records:
- rmdgallery, setup-r, macOS-latest (devel): https://github.com/riccardoporreca/rmdgallery/runs/3981690523?check_suite_focus=true
Sat, 23 Oct 2021 00:51:51 GMT ==> Pouring qpdf--10.3.2.big_sur.bottle.tar.gz
Sat, 23 Oct 2021 00:51:57 GMT 🍺 /usr/local/Cellar/qpdf/10.3.2: 73 files, 6.3MB
Sat, 23 Oct 2021 00:51:57 GMT ==> Pouring checkbashisms--2.21.4.all.bottle.tar.gz
Sat, 23 Oct 2021 00:52:03 GMT 🍺 /usr/local/Cellar/checkbashisms/2.21.4: 6 files, 71.2KB
Sat, 23 Oct 2021 06:51:13 GMT Error: The operation was canceled.
- rmdgallery, setup-r, macOS-latest (oldrel-1): https://github.com/riccardoporreca/rmdgallery/runs/3997956955?check_suite_focus=true
Mon, 25 Oct 2021 14:07:22 GMT ==> Pouring qpdf--10.3.2.big_sur.bottle.tar.gz
Mon, 25 Oct 2021 14:07:27 GMT 🍺 /usr/local/Cellar/qpdf/10.3.2: 73 files, 6.3MB
Mon, 25 Oct 2021 14:07:27 GMT ==> Pouring checkbashisms--2.21.4.all.bottle.tar.gz
Mon, 25 Oct 2021 14:07:32 GMT 🍺 /usr/local/Cellar/checkbashisms/2.21.4: 6 files, 71.2KB
Mon, 25 Oct 2021 20:06:43 GMT Error: The operation was canceled.
- rTRNG, setup-r, macOS-latest (oldrel): https://github.com/miraisolutions/rTRNG/runs/4123058504?check_suite_focus=true
Sat, 06 Nov 2021 00:14:44 GMT ==> Pouring qpdf--10.3.2.big_sur.bottle.tar.gz
Sat, 06 Nov 2021 00:14:48 GMT 🍺 /usr/local/Cellar/qpdf/10.3.2: 73 files, 6.3MB
Sat, 06 Nov 2021 00:14:48 GMT ==> Pouring checkbashisms--2.21.4.all.bottle.tar.gz
Sat, 06 Nov 2021 00:14:51 GMT 🍺 /usr/local/Cellar/checkbashisms/2.21.4: 6 files, 71.2KB
Sat, 06 Nov 2021 06:14:12 GMT Error: The operation was canceled.
- rTRNG, setup-pandoc, macOS-latest (release): https://github.com/miraisolutions/rTRNG/runs/4195770215?check_suite_focus=true
Sat, 13 Nov 2021 00:16:48 GMT sudo installer -pkg /Users/runner/work/_temp/pandoc-2.7.3-macOS.pkg -target /
Sat, 13 Nov 2021 06:15:27 GMT Error: The operation was canceled.
Expected behavior It would be great if setup-r and setup-pandoc would capture explicitly the issue causing the action to get stuck, or perhaps just handle a meaningful timeout internally for individual operations, perhaps hinting at re-running the workflow in a timeout error message. Using the action in a workflow, one can of course set a timeout for the action step, see e.g. https://github.com/riccardoporreca/rmdgallery/commit/65b87fe13cbbb46f5226c427a811343a5430c4f3, but I believe for the users this should be best handled in the action itself.
Additional context Given how common the issue is (I see it every week in cron runs), it is not great to keep a runner busy for 6hours if we can prevent it.
This is likely a network failure that never times out, probably from homebrew, I don't think there is anything we can do, just cancel these jobs if you see one taking too long.
Thanks @jimhester, closing this.
I'm experiencing this quite often (on website daily builds).
Shame theres no way to tag something to retry if there is a timeout.
What about adding a manual timeout for that particular step in the GH action? https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#jobsjob_idstepstimeout-minutes
These do not look like network errors to me. They seem to hang after calling installer
, either for pandoc or gfortran. This is probably because there is a dialog window for the user to OK.
I added some debugging to the installer
calls, in the master
branch. Can you switch some of your projects to that? Hopefully we'll learn what is happening.
@gaborcsardi, I have triggered a number or builds using setup-r/-pandoc from master
. One on the builds hit a timeout (which I have set to 10 min), see https://github.com/riccardoporreca/rmdgallery/runs/4434994244?check_suite_focus=true#step:4:90
Mon, 06 Dec 2021 19:17:15 GMT ==> Pouring qpdf--10.4.0.big_sur.bottle.tar.gz
Mon, 06 Dec 2021 19:17:19 GMT 🍺 /usr/local/Cellar/qpdf/10.4.0: 73 files, 6.4MB
Mon, 06 Dec 2021 19:17:19 GMT ==> Pouring checkbashisms--2.21.5.all.bottle.tar.gz
Mon, 06 Dec 2021 19:17:21 GMT 🍺 /usr/local/Cellar/checkbashisms/2.21.5: 6 files, 71.8KB
Mon, 06 Dec 2021 19:26:56 GMT Error: The action has timed out.
Not sure how much more informative this is, but the installer extra logs might help. For reference, here is the log to an equivalent successful run: https://github.com/riccardoporreca/rmdgallery/runs/4434997287?check_suite_focus=true#step:4:89, I just noticed the failing job above does not log anything like "installer: Package name is R 4.2.0 for macOS". Hope this helps
I hit another failure in https://github.com/riccardoporreca/rmdgallery/runs/4434994121?check_suite_focus=true#step:4:89, which should be completely unrelated
installer: Package name is gfortran
installer: Installing at base path /
installer: The install was successful.
/usr/bin/sudo hdiutil detach /Volumes/gfortran-8.2-Mojave
hdiutil: couldn't unmount "disk2" - Resource busy
Error: Failed to get R 4.0.5: Failed to get R 4.0.5: Failed to umount /Volumes/gfortran-8.2-Mojave: Error: The process '/usr/bin/sudo' failed with exit code 16
Thanks, that is useful! It seems that the R installer can be stuck as well, so I should add the logging to that as well. My theory is that the issue is the concurrent installs interfering. (Pretty much a guess at this point.) More soon.
Btw. the disk eject failure could also be a concurrency issue, but in general we should ignore disk ejection failures, the attached disks do not bother anyone.
For the record, I added logging for the R installer as well in the master
branch, if you encounter timeouts, please link them to this issue. Thanks!
@gaborcsardi, I have done another battery of runs (same setup as above), and here the logs of three timeout failures
- https://github.com/riccardoporreca/rmdgallery/runs/4436541095?check_suite_focus=true#step:4:99
- https://github.com/riccardoporreca/rmdgallery/runs/4436600329?check_suite_focus=true#step:4:99
- https://github.com/riccardoporreca/rmdgallery/runs/4436612718?check_suite_focus=true#step:4:99
Hope having this few more examples, including the additional logging, help.
@riccardoporreca Thanks, very helpful indeed! One of the installer processes clearly hangs. I still cannot reproduce the this locally, but nevertheless I made the macOS installations sequential now, hopefully this will help.
My impression is that with v2
this is now better, but I still saw some hangs on gfortran:
https://github.com/r-lib/revdepcheck/runs/4477261536?check_suite_focus=true#step:4:45
I'll remove the detach from the gfortran volume, in case that causes it.
I reran it a bunch of times, and it seems to work better, so fingers crossed: https://github.com/r-lib/revdepcheck/runs/4483425337?check_suite_focus=true
I would encourage everyone to switch to @v2
. I still haven't seen any freezes with the new tag. We can keep this issue still open a bit longer. If there won't be any new issues in about a month, I'll close it.
Hi.
I got a timeout yesterday it seems with v2 https://github.com/rladies/awesome-rladies-blogs/runs/4540433972?check_suite_focus=true
First one I've seen with this version.
Can you try using v2
for everything? I am not sure what GHA does if you checkout both: https://github.com/rladies/awesome-rladies-blogs/runs/4540433972?check_suite_focus=true#step:1:32
setup-pandoc@v2
is pretty much unchanged from v1
I think.
It is a very long shot...
Sure, but does GHA really care if there is a mix of action versions as long as they dont collide? I reran the jobs asis and it worked fine this time. As it usually does on rerun.
For a second I was thinking if GHA checks them out into the same directory. But apparently not, because the setup-pandoc
action does not have debugging, so it is v1
, whereas setup-r
does, so it is v2
.
I would still use v2
, maybe the debugging helps. It seems to me that installer
returns the prompt while still running some setup processes in the background. This could be different if we specify -dumplog
.
Anyway, we can add timeouts and retries, if we'll still see errors with a fully v2
workflow. Which is likely.
setup-pandoc@v2
timed out for me yesterday, here are the relevant logs.
sudo installer -allowUntrusted -dumplog -pkg /Users/runner/work/_temp/pandoc-2.14.2-macOS.pkg -target /
Jan 3 22:09:45 installer[3368] <Debug>: Product archive /Users/runner/work/_temp/pandoc-2.14.2-macOS.pkg trustLevel=350
Jan 3 22:09:45 installer[3368] <Debug>: External component packages (1) trustLevel=350
Jan 3 22:09:45 installer[3368] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: location = file://localhost
Jan 3 22:09:45 installer[3368] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: file://localhost/Users/runner/work/_temp/pandoc-2.14.2-macOS.pkg#pandoc.pkg
Jan 3 22:09:45 installer[3368] <Debug>: Set authorization level to root for session
Jan 3 22:09:45 installer[3368] <Info>: Administrator authorization granted.
Error: The operation was canceled.
(I'm not sure if it's relevant, but I missed updating setup-r
in that workflow and it was at v1
for that run.)
I have just seen it twice with all actions on v2
: https://github.com/r-lib/actions/runs/4714679983?check_suite_focus=true
In both cases the gfortran installation froze:
2022-01-05T12:44:08.1412420Z [command]/usr/bin/sudo installer -allowUntrusted -dumplog -package /Volumes/gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg -target /
2022-01-05T12:44:08.2767460Z Jan 5 12:44:08 installer[1197] <Debug>: fileURLForURL = x-disc://gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg
2022-01-05T12:44:08.2871220Z Jan 5 12:44:08 installer[1197] <Debug>: fileURLForURL = file://localhost/Volumes/gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg
2022-01-05T12:44:08.9498810Z Jan 5 12:44:08 installer[1197] <Debug>: Referenced component packages (1) trustLevel=350
2022-01-05T12:44:08.9602120Z Jan 5 12:44:08 installer[1197] <Debug>: -[IFPKGDerivedDocument sortedPackageLocations]: result = (
2022-01-05T12:44:08.9705050Z "file://localhost"
2022-01-05T12:44:08.9812120Z )
2022-01-05T12:44:08.9918330Z Jan 5 12:44:08 installer[1197] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: location = file://localhost
2022-01-05T12:44:09.0025590Z Jan 5 12:44:08 installer[1197] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: file://localhost/Volumes/gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg
2022-01-05T12:44:09.0128490Z Jan 5 12:44:08 installer[1197] <Debug>: Set authorization level to root for session
2022-01-05T12:44:09.0129580Z Jan 5 12:44:08 installer[1197] <Info>: Administrator authorization granted.
2022-01-05T13:13:53.9292480Z ##[error]The operation was canceled.
Just a note that this is still an issue. Hopefully switching to rig will solve it.
I haven't seen this for a while, maybe because of the macos updates on GHA. Anyone else has seen this recently?
I am going to close this now. Hopefully it is not coming back.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue and include a link to this issue