Workflow run is stuck for over 2 hours on GitHub-hosted macOS runners (macOS 10.15, 11 and 12)
Describe the bug I have a workflow that gets stuck on a regular basis waiting for macOS VMs. See for instance https://github.com/pombredanne/scancode-toolkit/actions/runs/2877397119 where the workflow has been stuck for over 2 hours waiting for macOS runners. This has been happening to me a couple times over the last week.
To Reproduce See https://github.com/pombredanne/scancode-toolkit/actions/runs/2877397119 I cannot find any pattern
Expected behavior The macOS VMs should become available without waiting hours.
Runner Version and Platform
That's the MSFT/GitHub hosted runner
OS of the machine running the runner? OSX/Windows/Linux/... macOS
What's not working?
The log of each stuck job has something like this:
Test POSIX PyPI wheels (macos-12, 3.9)
Started 2h 22m 15s ago
The agent pool assigned to this job has hit their MacOs concurrency limits
Requested labels: macos-12
Job defined at: pombredanne/scancode-toolkit/.github/workflows/scancode-release.yml@refs/tags/v31.0.0rc51
Waiting for a runner to pick up this job...

Job Log Output
There is no log yet.
Some recent jobs were subjects to the same issue:
- https://github.com/pombredanne/scancode-toolkit/actions/runs/2853533833 : 5hours!
With these two I lost patience and eventually killed some of them:
- https://github.com/pombredanne/scancode-toolkit/actions/runs/2863073652 : 2hours +
- https://github.com/pombredanne/scancode-toolkit/actions/runs/2862796857 : ~3hours
I kicked another job for kicks and the mac runners are stuck too:
Test POSIX PyPI wheels (macos-12, 3.9) Started 1m 17s ago The agent pool assigned to this job has hit their MacOs concurrency limits Requested labels: macos-12 Job defined at: pombredanne/scancode-toolkit/.github/workflows/scancode-release.yml@refs/tags/v31.0.0rc51 Waiting for a runner to pick up this job...
This is not something that would be under my control AFAIK.

Your waiting jobs are waiting for macos concurrency for your account, and all your 5 free macos hosted concurrency are used by https://github.com/pombredanne/PyOxidizer/actions/runs/2876456583 now.
Your waiting jobs are waiting for macos concurrency for your account, and all your 5 free macos hosted concurrency are used by https://github.com/pombredanne/PyOxidizer/actions/runs/2876456583 now.
Thanks! good catch... but duh... that's a fork! I never asked nor wanted any workflow to run on these forks. I wonder if I can disable workflows globally unless I select to run some. This is wasting a tons of resource otherwise.
There seems to be no way to disable globally actions and then to enable them selectively only on certain repos ... And since I have a certain, not too small number of forks I cannot humanly control what is happening. I cannot even know which forked repos are running workflows behind my back. Or could I?
@TingluoHuang How did you figure out there was some workflow stealing my quotas somewhere? How can I find that out?
@pombredanne I am able to check it via internal telemetry. It's bad that you can't self-server this kind of problem. 🙇
@TingluoHuang re:
It's bad that you can't self-server this kind of problem. bow
Yep this would be awesome if I could... alternatively or in addition disabling Actions on forks and make them opt-in would make the issue much simpler.
Or ... just give me access to your telemetry :smiling_imp:
May be there is something I can hack with some API calls at least so I can disable actions on all my repos, except for a few where I want them to run?
Side note: I surmise that running jobs randomly on all the forks must waste quite a bit CPU and resources globally because it eventually saturates using resources for jobs that users never requested. It's probably worth millions that are wasted.
You might want to report those feedback around the fork repo to https://github.com/community/community/discussions/categories/actions-and-packages The runner itself has no control over those. 😢
I guess that this problem is more real then eve, and we even have zero output on console, so no clue regarding what is really happening there. https://github.com/ansible/vscode-ansible/actions/runs/3731365458/jobs/6329509879
This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 15 days.
This issue was closed because it has been stalled for 15 days with no activity.