runner icon indicating copy to clipboard operation
runner copied to clipboard

Workflow run is stuck for over 2 hours on GitHub-hosted macOS runners (macOS 10.15, 11 and 12)

Open pombredanne opened this issue 3 years ago • 12 comments

Describe the bug I have a workflow that gets stuck on a regular basis waiting for macOS VMs. See for instance https://github.com/pombredanne/scancode-toolkit/actions/runs/2877397119 where the workflow has been stuck for over 2 hours waiting for macOS runners. This has been happening to me a couple times over the last week.

To Reproduce See https://github.com/pombredanne/scancode-toolkit/actions/runs/2877397119 I cannot find any pattern

Expected behavior The macOS VMs should become available without waiting hours.

Runner Version and Platform

That's the MSFT/GitHub hosted runner

OS of the machine running the runner? OSX/Windows/Linux/... macOS

What's not working?

The log of each stuck job has something like this:

Test POSIX PyPI wheels (macos-12, 3.9)
Started 2h 22m 15s ago
The agent pool assigned to this job has hit their MacOs concurrency limits
Requested labels: macos-12
Job defined at: pombredanne/scancode-toolkit/.github/workflows/scancode-release.yml@refs/tags/v31.0.0rc51
Waiting for a runner to pick up this job...

Screenshot from 2022-08-17 22-38-13

Job Log Output

There is no log yet.

pombredanne avatar Aug 17 '22 20:08 pombredanne

Some recent jobs were subjects to the same issue:

  • https://github.com/pombredanne/scancode-toolkit/actions/runs/2853533833 : 5hours!

With these two I lost patience and eventually killed some of them:

  • https://github.com/pombredanne/scancode-toolkit/actions/runs/2863073652 : 2hours +
  • https://github.com/pombredanne/scancode-toolkit/actions/runs/2862796857 : ~3hours

pombredanne avatar Aug 17 '22 20:08 pombredanne

I kicked another job for kicks and the mac runners are stuck too:

Test POSIX PyPI wheels (macos-12, 3.9) Started 1m 17s ago The agent pool assigned to this job has hit their MacOs concurrency limits Requested labels: macos-12 Job defined at: pombredanne/scancode-toolkit/.github/workflows/scancode-release.yml@refs/tags/v31.0.0rc51 Waiting for a runner to pick up this job...

This is not something that would be under my control AFAIK.

pombredanne avatar Aug 17 '22 20:08 pombredanne

image

Your waiting jobs are waiting for macos concurrency for your account, and all your 5 free macos hosted concurrency are used by https://github.com/pombredanne/PyOxidizer/actions/runs/2876456583 now.

TingluoHuang avatar Aug 17 '22 20:08 TingluoHuang

Your waiting jobs are waiting for macos concurrency for your account, and all your 5 free macos hosted concurrency are used by https://github.com/pombredanne/PyOxidizer/actions/runs/2876456583 now.

Thanks! good catch... but duh... that's a fork! I never asked nor wanted any workflow to run on these forks. I wonder if I can disable workflows globally unless I select to run some. This is wasting a tons of resource otherwise.

pombredanne avatar Aug 17 '22 21:08 pombredanne

There seems to be no way to disable globally actions and then to enable them selectively only on certain repos ... And since I have a certain, not too small number of forks I cannot humanly control what is happening. I cannot even know which forked repos are running workflows behind my back. Or could I?

pombredanne avatar Aug 17 '22 21:08 pombredanne

@TingluoHuang How did you figure out there was some workflow stealing my quotas somewhere? How can I find that out?

pombredanne avatar Aug 17 '22 21:08 pombredanne

@pombredanne I am able to check it via internal telemetry. It's bad that you can't self-server this kind of problem. 🙇

TingluoHuang avatar Aug 17 '22 21:08 TingluoHuang

@TingluoHuang re:

It's bad that you can't self-server this kind of problem. bow

Yep this would be awesome if I could... alternatively or in addition disabling Actions on forks and make them opt-in would make the issue much simpler.

Or ... just give me access to your telemetry :smiling_imp:

pombredanne avatar Aug 17 '22 21:08 pombredanne

May be there is something I can hack with some API calls at least so I can disable actions on all my repos, except for a few where I want them to run?

pombredanne avatar Aug 17 '22 21:08 pombredanne

Side note: I surmise that running jobs randomly on all the forks must waste quite a bit CPU and resources globally because it eventually saturates using resources for jobs that users never requested. It's probably worth millions that are wasted.

pombredanne avatar Aug 17 '22 21:08 pombredanne

You might want to report those feedback around the fork repo to https://github.com/community/community/discussions/categories/actions-and-packages The runner itself has no control over those. 😢

TingluoHuang avatar Aug 17 '22 22:08 TingluoHuang

I guess that this problem is more real then eve, and we even have zero output on console, so no clue regarding what is really happening there. https://github.com/ansible/vscode-ansible/actions/runs/3731365458/jobs/6329509879

ssbarnea avatar Dec 19 '22 13:12 ssbarnea

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 15 days.

github-actions[bot] avatar Dec 25 '23 00:12 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity.

github-actions[bot] avatar Jan 15 '24 00:01 github-actions[bot]