actions icon indicating copy to clipboard operation
actions copied to clipboard

setup-r / setup-pandoc stuck on macOS (hitting 6h timeout)

Open riccardoporreca opened this issue 2 years ago • 21 comments

Describe the bug Over the last weeks, I have often seen setup-r (and setup-pandoc, just today) getting stuck and eventually hitting the default 6h timeout. This has happened on macOS-latest, in different repos and for various R versions. From the log it is not very clear where and why the execution got stuck (especially for setup-r), and the issue is transient in that a re-run usually goes through.

To Reproduce

Examples of failures, with relevant logs including timing for convenience and for the records:

  • rmdgallery, setup-r, macOS-latest (devel): https://github.com/riccardoporreca/rmdgallery/runs/3981690523?check_suite_focus=true
Sat, 23 Oct 2021 00:51:51 GMT ==> Pouring qpdf--10.3.2.big_sur.bottle.tar.gz
Sat, 23 Oct 2021 00:51:57 GMT 🍺  /usr/local/Cellar/qpdf/10.3.2: 73 files, 6.3MB
Sat, 23 Oct 2021 00:51:57 GMT ==> Pouring checkbashisms--2.21.4.all.bottle.tar.gz
Sat, 23 Oct 2021 00:52:03 GMT 🍺  /usr/local/Cellar/checkbashisms/2.21.4: 6 files, 71.2KB
Sat, 23 Oct 2021 06:51:13 GMT Error: The operation was canceled.
  • rmdgallery, setup-r, macOS-latest (oldrel-1): https://github.com/riccardoporreca/rmdgallery/runs/3997956955?check_suite_focus=true
Mon, 25 Oct 2021 14:07:22 GMT ==> Pouring qpdf--10.3.2.big_sur.bottle.tar.gz
Mon, 25 Oct 2021 14:07:27 GMT 🍺  /usr/local/Cellar/qpdf/10.3.2: 73 files, 6.3MB
Mon, 25 Oct 2021 14:07:27 GMT ==> Pouring checkbashisms--2.21.4.all.bottle.tar.gz
Mon, 25 Oct 2021 14:07:32 GMT 🍺  /usr/local/Cellar/checkbashisms/2.21.4: 6 files, 71.2KB
Mon, 25 Oct 2021 20:06:43 GMT Error: The operation was canceled.
  • rTRNG, setup-r, macOS-latest (oldrel): https://github.com/miraisolutions/rTRNG/runs/4123058504?check_suite_focus=true
Sat, 06 Nov 2021 00:14:44 GMT ==> Pouring qpdf--10.3.2.big_sur.bottle.tar.gz
Sat, 06 Nov 2021 00:14:48 GMT 🍺  /usr/local/Cellar/qpdf/10.3.2: 73 files, 6.3MB
Sat, 06 Nov 2021 00:14:48 GMT ==> Pouring checkbashisms--2.21.4.all.bottle.tar.gz
Sat, 06 Nov 2021 00:14:51 GMT 🍺  /usr/local/Cellar/checkbashisms/2.21.4: 6 files, 71.2KB
Sat, 06 Nov 2021 06:14:12 GMT Error: The operation was canceled.
  • rTRNG, setup-pandoc, macOS-latest (release): https://github.com/miraisolutions/rTRNG/runs/4195770215?check_suite_focus=true
Sat, 13 Nov 2021 00:16:48 GMT sudo installer -pkg /Users/runner/work/_temp/pandoc-2.7.3-macOS.pkg -target /
Sat, 13 Nov 2021 06:15:27 GMT Error: The operation was canceled.

Expected behavior It would be great if setup-r and setup-pandoc would capture explicitly the issue causing the action to get stuck, or perhaps just handle a meaningful timeout internally for individual operations, perhaps hinting at re-running the workflow in a timeout error message. Using the action in a workflow, one can of course set a timeout for the action step, see e.g. https://github.com/riccardoporreca/rmdgallery/commit/65b87fe13cbbb46f5226c427a811343a5430c4f3, but I believe for the users this should be best handled in the action itself.

Additional context Given how common the issue is (I see it every week in cron runs), it is not great to keep a runner busy for 6hours if we can prevent it.

riccardoporreca avatar Nov 13 '21 09:11 riccardoporreca

This is likely a network failure that never times out, probably from homebrew, I don't think there is anything we can do, just cancel these jobs if you see one taking too long.

jimhester avatar Nov 15 '21 20:11 jimhester

Thanks @jimhester, closing this.

riccardoporreca avatar Nov 16 '21 08:11 riccardoporreca

I'm experiencing this quite often (on website daily builds).

Shame theres no way to tag something to retry if there is a timeout.

drmowinckels avatar Dec 06 '21 13:12 drmowinckels

What about adding a manual timeout for that particular step in the GH action? https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#jobsjob_idstepstimeout-minutes

ha0ye avatar Dec 06 '21 14:12 ha0ye

These do not look like network errors to me. They seem to hang after calling installer, either for pandoc or gfortran. This is probably because there is a dialog window for the user to OK.

gaborcsardi avatar Dec 06 '21 14:12 gaborcsardi

I added some debugging to the installer calls, in the master branch. Can you switch some of your projects to that? Hopefully we'll learn what is happening.

gaborcsardi avatar Dec 06 '21 15:12 gaborcsardi

@gaborcsardi, I have triggered a number or builds using setup-r/-pandoc from master. One on the builds hit a timeout (which I have set to 10 min), see https://github.com/riccardoporreca/rmdgallery/runs/4434994244?check_suite_focus=true#step:4:90

Mon, 06 Dec 2021 19:17:15 GMT ==> Pouring qpdf--10.4.0.big_sur.bottle.tar.gz
Mon, 06 Dec 2021 19:17:19 GMT 🍺  /usr/local/Cellar/qpdf/10.4.0: 73 files, 6.4MB
Mon, 06 Dec 2021 19:17:19 GMT ==> Pouring checkbashisms--2.21.5.all.bottle.tar.gz
Mon, 06 Dec 2021 19:17:21 GMT 🍺  /usr/local/Cellar/checkbashisms/2.21.5: 6 files, 71.8KB
Mon, 06 Dec 2021 19:26:56 GMT Error: The action has timed out.

Not sure how much more informative this is, but the installer extra logs might help. For reference, here is the log to an equivalent successful run: https://github.com/riccardoporreca/rmdgallery/runs/4434997287?check_suite_focus=true#step:4:89, I just noticed the failing job above does not log anything like "installer: Package name is R 4.2.0 for macOS". Hope this helps

I hit another failure in https://github.com/riccardoporreca/rmdgallery/runs/4434994121?check_suite_focus=true#step:4:89, which should be completely unrelated

installer: Package name is gfortran
installer: Installing at base path /
installer: The install was successful.
/usr/bin/sudo hdiutil detach /Volumes/gfortran-8.2-Mojave
hdiutil: couldn't unmount "disk2" - Resource busy
Error: Failed to get R 4.0.5: Failed to get R 4.0.5: Failed to umount /Volumes/gfortran-8.2-Mojave: Error: The process '/usr/bin/sudo' failed with exit code 16

riccardoporreca avatar Dec 06 '21 19:12 riccardoporreca

Thanks, that is useful! It seems that the R installer can be stuck as well, so I should add the logging to that as well. My theory is that the issue is the concurrent installs interfering. (Pretty much a guess at this point.) More soon.

Btw. the disk eject failure could also be a concurrency issue, but in general we should ignore disk ejection failures, the attached disks do not bother anyone.

gaborcsardi avatar Dec 06 '21 20:12 gaborcsardi

For the record, I added logging for the R installer as well in the master branch, if you encounter timeouts, please link them to this issue. Thanks!

gaborcsardi avatar Dec 06 '21 20:12 gaborcsardi

@gaborcsardi, I have done another battery of runs (same setup as above), and here the logs of three timeout failures

  • https://github.com/riccardoporreca/rmdgallery/runs/4436541095?check_suite_focus=true#step:4:99
  • https://github.com/riccardoporreca/rmdgallery/runs/4436600329?check_suite_focus=true#step:4:99
  • https://github.com/riccardoporreca/rmdgallery/runs/4436612718?check_suite_focus=true#step:4:99

Hope having this few more examples, including the additional logging, help.

riccardoporreca avatar Dec 06 '21 22:12 riccardoporreca

@riccardoporreca Thanks, very helpful indeed! One of the installer processes clearly hangs. I still cannot reproduce the this locally, but nevertheless I made the macOS installations sequential now, hopefully this will help.

gaborcsardi avatar Dec 06 '21 22:12 gaborcsardi

My impression is that with v2 this is now better, but I still saw some hangs on gfortran: https://github.com/r-lib/revdepcheck/runs/4477261536?check_suite_focus=true#step:4:45

I'll remove the detach from the gfortran volume, in case that causes it.

gaborcsardi avatar Dec 10 '21 12:12 gaborcsardi

I reran it a bunch of times, and it seems to work better, so fingers crossed: https://github.com/r-lib/revdepcheck/runs/4483425337?check_suite_focus=true

gaborcsardi avatar Dec 10 '21 12:12 gaborcsardi

I would encourage everyone to switch to @v2. I still haven't seen any freezes with the new tag. We can keep this issue still open a bit longer. If there won't be any new issues in about a month, I'll close it.

gaborcsardi avatar Dec 14 '21 10:12 gaborcsardi

Hi.

I got a timeout yesterday it seems with v2 https://github.com/rladies/awesome-rladies-blogs/runs/4540433972?check_suite_focus=true

First one I've seen with this version.

drmowinckels avatar Dec 16 '21 07:12 drmowinckels

Can you try using v2 for everything? I am not sure what GHA does if you checkout both: https://github.com/rladies/awesome-rladies-blogs/runs/4540433972?check_suite_focus=true#step:1:32

setup-pandoc@v2 is pretty much unchanged from v1 I think.

It is a very long shot...

gaborcsardi avatar Dec 16 '21 07:12 gaborcsardi

Sure, but does GHA really care if there is a mix of action versions as long as they dont collide? I reran the jobs asis and it worked fine this time. As it usually does on rerun.

drmowinckels avatar Dec 16 '21 07:12 drmowinckels

For a second I was thinking if GHA checks them out into the same directory. But apparently not, because the setup-pandoc action does not have debugging, so it is v1, whereas setup-r does, so it is v2.

I would still use v2, maybe the debugging helps. It seems to me that installer returns the prompt while still running some setup processes in the background. This could be different if we specify -dumplog.

gaborcsardi avatar Dec 16 '21 08:12 gaborcsardi

Anyway, we can add timeouts and retries, if we'll still see errors with a fully v2 workflow. Which is likely.

gaborcsardi avatar Dec 16 '21 08:12 gaborcsardi

setup-pandoc@v2 timed out for me yesterday, here are the relevant logs.

sudo installer -allowUntrusted -dumplog -pkg /Users/runner/work/_temp/pandoc-2.14.2-macOS.pkg -target /
Jan  3 22:09:45  installer[3368] <Debug>: Product archive /Users/runner/work/_temp/pandoc-2.14.2-macOS.pkg trustLevel=350
Jan  3 22:09:45  installer[3368] <Debug>: External component packages (1) trustLevel=350
Jan  3 22:09:45  installer[3368] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: location = file://localhost
Jan  3 22:09:45  installer[3368] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: file://localhost/Users/runner/work/_temp/pandoc-2.14.2-macOS.pkg#pandoc.pkg
Jan  3 22:09:45  installer[3368] <Debug>: Set authorization level to root for session
Jan  3 22:09:45  installer[3368] <Info>: Administrator authorization granted.
Error: The operation was canceled.

(I'm not sure if it's relevant, but I missed updating setup-r in that workflow and it was at v1 for that run.)

gadenbuie avatar Jan 04 '22 15:01 gadenbuie

I have just seen it twice with all actions on v2: https://github.com/r-lib/actions/runs/4714679983?check_suite_focus=true

In both cases the gfortran installation froze:

2022-01-05T12:44:08.1412420Z [command]/usr/bin/sudo installer -allowUntrusted -dumplog -package /Volumes/gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg -target /
2022-01-05T12:44:08.2767460Z Jan  5 12:44:08  installer[1197] <Debug>: fileURLForURL = x-disc://gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg
2022-01-05T12:44:08.2871220Z Jan  5 12:44:08  installer[1197] <Debug>: fileURLForURL = file://localhost/Volumes/gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg
2022-01-05T12:44:08.9498810Z Jan  5 12:44:08  installer[1197] <Debug>: Referenced component packages (1) trustLevel=350
2022-01-05T12:44:08.9602120Z Jan  5 12:44:08  installer[1197] <Debug>: -[IFPKGDerivedDocument sortedPackageLocations]: result = (
2022-01-05T12:44:08.9705050Z 	    "file://localhost"
2022-01-05T12:44:08.9812120Z 	)
2022-01-05T12:44:08.9918330Z Jan  5 12:44:08  installer[1197] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: location = file://localhost
2022-01-05T12:44:09.0025590Z Jan  5 12:44:08  installer[1197] <Debug>: -[IFDInstallController(Private) _buildInstallPlanReturningError:]: file://localhost/Volumes/gfortran-8.2-Mojave/gfortran-8.2-Mojave/gfortran.pkg
2022-01-05T12:44:09.0128490Z Jan  5 12:44:08  installer[1197] <Debug>: Set authorization level to root for session
2022-01-05T12:44:09.0129580Z Jan  5 12:44:08  installer[1197] <Info>: Administrator authorization granted.
2022-01-05T13:13:53.9292480Z ##[error]The operation was canceled.

gaborcsardi avatar Jan 05 '22 13:01 gaborcsardi

Just a note that this is still an issue. Hopefully switching to rig will solve it.

gaborcsardi avatar Oct 19 '22 12:10 gaborcsardi

I haven't seen this for a while, maybe because of the macos updates on GHA. Anyone else has seen this recently?

gaborcsardi avatar Jan 27 '23 14:01 gaborcsardi

I am going to close this now. Hopefully it is not coming back.

gaborcsardi avatar May 02 '23 21:05 gaborcsardi

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue and include a link to this issue

github-actions[bot] avatar May 17 '23 01:05 github-actions[bot]