community icon indicating copy to clipboard operation
community copied to clipboard

Sporadic delays with github runners starting workflow executions

Open trask opened this issue 7 months ago • 3 comments

Reports made in the #otel-maintainers Slack channel:

  • March 21, 9:37a Pacific Time: https://cloud-native.slack.com/archives/C01NJ7V1KRC/p1742575048528319
  • Tue, Apr 8, 9:13a Pacific Time: https://cloud-native.slack.com/archives/C01NJ7V1KRC/p1744128838919859
  • Wed, Apr 9 7:19a Pacific Time: https://cloud-native.slack.com/archives/C01NJ7V1KRC/p1744208353371499?thread_ts=1744128838.919859&cid=C01NJ7V1KRC

Using the data now available from #2606 (thanks @adrielp!), we can get proxy data for these delays by measuring one of the collector contrib workflows that runs often and generally runs very fast

This chart represents the number of executions > 2 minute of the "Add code owners to a PR" job in the collector contrib repo:

Image

I've opened a CNCF service desk ticket asking if they can look into it since they own the github runner limits, I'm just opening this as a tracking issue.

trask avatar Apr 11 '25 19:04 trask

I added project-infra to this, even though the label description says that this is for 'non-Github' issues only. @trask do you think there is a better area label for this?

mx-psi avatar Apr 14 '25 09:04 mx-psi

@trask I've picked up your Service Desk request looking at this now.

RobertKielty avatar Apr 14 '25 09:04 RobertKielty

There are basic liveness checks that I can carry out in realtime at the enterprise settings level when you are experiencing delays in getting jobs serviced.

I am happy to escalate delays up to the team in GitHub in real time to see if we can get more detailed information from them as to why jobs are sitting in pending queues for a long time.

Reach out to me the next time this happens, listing the jobs and runners that are pending and I will see what I can do.

Edit: A reminder that in realtime, status.github.com should be checked to see if there a known operational issue.

RobertKielty avatar Apr 14 '25 10:04 RobertKielty

We haven't seen this lately, closing, but definitely let us know if anyone experiences this going forward

trask avatar Jul 31 '25 18:07 trask