runner-images icon indicating copy to clipboard operation
runner-images copied to clipboard

An error occurred while provisioning resources (Error Type: Disconnect).

Open alexlamsl opened this issue 3 years ago • 19 comments

Description
Jobs on macOS would intermittently fail without logs, with the error message in title only appearing some of the time.

Here are a list of failed jobs over the past three days: https://github.com/mishoo/UglifyJS/runs/2730669944?check_suite_focus=true https://github.com/mishoo/UglifyJS/runs/2719886305?check_suite_focus=true https://github.com/mishoo/UglifyJS/runs/2718446501?check_suite_focus=true https://github.com/mishoo/UglifyJS/runs/2716605609?check_suite_focus=true https://github.com/mishoo/UglifyJS/runs/2712627146?check_suite_focus=true https://github.com/mishoo/UglifyJS/runs/2711652206?check_suite_focus=true

Not sure if related, whilst at lower frequency I also encountered jobs being reported as cancelled: https://github.com/mishoo/UglifyJS/runs/2706621544?check_suite_focus=true

Area for Triage: Deployment/Release

Question, Bug, or Feature?: Bug

Virtual environments affected

  • [ ] Ubuntu 16.04
  • [ ] Ubuntu 18.04
  • [ ] Ubuntu 20.04
  • [x] macOS 10.15
  • [ ] macOS 11
  • [ ] Windows Server 2016 R2
  • [ ] Windows Server 2019

Image version Current runner version: '2.278.0'

Expected behavior
Jobs complete with viewable logs.

Actual behavior
Missing logs − even with View raw logs:

2021-05-31T14:43:02.9964115Z ##[section]Starting: Request a runner to run this job
2021-05-31T14:43:03.4321453Z Can't find any online and idle self-hosted runner in current repository that matches the required labels: 'macos-latest'
2021-05-31T14:43:03.4321551Z Can't find any online and idle self-hosted runner in current repository's account/organization that matches the required labels: 'macos-latest'
2021-05-31T14:43:03.4321605Z Can't find any online and idle hosted runner in current repository's account/organization that matches the required labels: 'macos-latest'
2021-05-31T14:43:03.4321706Z Found online and busy hosted runners in current repository's account/organization that matches the required labels: 'macos-latest'. Waiting for one of them to get assigned for this job.
2021-05-31T14:43:03.6599602Z ##[section]Finishing: Request a runner to run this job

Repro steps
A description with steps to reproduce the issue. If your have a public example or repo to share, please provide the link.

  1. watch scheduled workflow spawn
  2. occasionally macOS job would fail with missing logs

alexlamsl avatar Jun 02 '21 16:06 alexlamsl

thanks @alexlamsl for creating a separate issue. Am I right that it is enough to simply fork https://github.com/mishoo/UglifyJS/ and run the following workflow to reproduce the problem? https://github.com/mishoo/UglifyJS/blob/master/.github/workflows/ufuzz.yml

miketimofeev avatar Jun 02 '21 16:06 miketimofeev

Yes forking the repository and letting the aforementioned workflow run should be able to produce the (intermittent) issue.

You may want to edit out the Linux & Windows jobs to lighten the load since they don't exhibit the same issues.

Please be advised that the job may fail sometimes due to fuzzer hitting a false positive − but they would be distinctly different from the issue due to presence of logs.

alexlamsl avatar Jun 02 '21 16:06 alexlamsl

Another bunch of recent samples: https://github.com/mishoo/UglifyJS/runs/2763713274?check_suite_focus=true https://github.com/mishoo/UglifyJS/runs/2760713866?check_suite_focus=true https://github.com/mishoo/UglifyJS/runs/2760250084?check_suite_focus=true https://github.com/mishoo/UglifyJS/runs/2757930927?check_suite_focus=true https://github.com/mishoo/UglifyJS/runs/2757074905?check_suite_focus=true

alexlamsl avatar Jun 07 '21 19:06 alexlamsl

@alexlamsl thank you for provided examples, we are investigating the issue on our side to determine the exact reason for such behavior. Currently, I found only one thing, that tests consuming a lot of CPU resources on macOS machines, and possibly it leads the pipeline to fail. We need more time to find a root cause for this particular situation. I'll keep you informed.

Darleev avatar Jun 08 '21 17:06 Darleev

Not sure if related, but just now I've hit an instance of this but on Windows: https://github.com/mishoo/UglifyJS/runs/2821517386?check_suite_focus=true

alexlamsl avatar Jun 14 '21 17:06 alexlamsl

@alexlamsl thanks for the update! Windows is a pretty different story, so it's not related. Speaking about mac — we've narrowed down the list of the environments with issues, but unfortunately, we are still searching for the root cause.

miketimofeev avatar Jun 15 '21 09:06 miketimofeev

Not sure if this helps, but this failed job contains some information under View raw logs: https://github.com/mishoo/UglifyJS/runs/3003410939?check_suite_focus=true

And from a glance it seems to got "cancelled".

alexlamsl avatar Jul 07 '21 01:07 alexlamsl

Hi,

I have the same issue with macOS 11.

code4break avatar Jul 12 '21 12:07 code4break

@FireFighter80 do you have access to the macOS-11 pipeline? Just to make sure it's not the access issue

miketimofeev avatar Jul 12 '21 12:07 miketimofeev

@miketimofeev Thx. You're rights. That was the issue

code4break avatar Jul 12 '21 12:07 code4break

@miketimofeev has this issue been resolved?

I am still getting steady stream of these job failures, especially in the past week on a daily basis.

alexlamsl avatar May 13 '22 14:05 alexlamsl

@alexlamsl we haven't heard any cases so far that's why decided to close. Could you provide some new examples of such builds so I can escalate the issue?

miketimofeev avatar May 16 '22 11:05 miketimofeev

Ones that are immediately relevant: https://github.com/mishoo/UglifyJS/actions/runs/2306997233 https://github.com/mishoo/UglifyJS/actions/runs/2296601772 https://github.com/mishoo/UglifyJS/actions/runs/2284012926

Others that fail unexpectedly, not sure if related: https://github.com/mishoo/UglifyJS/actions/runs/2281443778 https://github.com/mishoo/UglifyJS/actions/runs/2327083565 https://github.com/mishoo/UglifyJS/actions/runs/2300065092

P.S. for the past few days I would encounter Angry Unicorn ~5% of the time when loading any Actions-related URLs

alexlamsl avatar May 16 '22 13:05 alexlamsl

I can replicate this on my fork as well: https://github.com/alexlamsl/UglifyJS/actions/runs/2310731271 https://github.com/alexlamsl/UglifyJS/actions/runs/2310324508 https://github.com/alexlamsl/UglifyJS/actions/runs/2309311098 https://github.com/alexlamsl/UglifyJS/actions/runs/2307865027 https://github.com/alexlamsl/UglifyJS/actions/runs/2264485393

Others: https://github.com/alexlamsl/UglifyJS/actions/runs/2277862274 https://github.com/alexlamsl/UglifyJS/actions/runs/2263136893

(ran into 🦄🦄🦄 while looking for these)

alexlamsl avatar May 16 '22 14:05 alexlamsl

@alexlamsl thanks, will engage the engineering team

miketimofeev avatar May 16 '22 15:05 miketimofeev

We're seeing the same issue very frequently with our Windows builds. There's no detailed errors to indicate what went wrong: https://github.com/Azure/bicep/actions/runs/2633252423

Summary view:

image

Trying to see logs for an individual job:

image

@miketimofeev - any update on this?

anthony-c-martin avatar Jul 08 '22 13:07 anthony-c-martin

Just to chime in here, this is plaguing one of my repos as well, but on ubuntu-latest image. Sadly it's not public, so I can't share any links or anything, but the affected workflow always fails around the 30min mark with either the error:

An error occurred while provisioning resources (Error Type: Disconnect).
Received request to deprovision: The request was cancelled by the remote provider.

or

The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
Process completed with exit code 143.

The workflow is just a simple action running npm quicktype. Occasionally, I will get some logs (as opposed to the typical no logs on the failing step that ran for 30min), but they only ever contain Killed\n. This has been happening for the past 4 months

niehusstaab avatar Aug 31 '22 16:08 niehusstaab

@niehusstaab even links to private repos will help as we don't need access to your repo to get the telemetry for the run and see if there was high CPU usage or something like that. Most likely this is the root cause.

miketimofeev avatar Aug 31 '22 17:08 miketimofeev

We're seeing the same issue very frequently with our Windows builds. There's no detailed errors to indicate what went wrong: https://github.com/Azure/bicep/actions/runs/2633252423 ... @miketimofeev - any update on this?

To circle back - by chance I discovered that one of our tests was eating up a LOT of system memory, and this issue stopped occurring once I fixed it. It would be super helpful if this information could have been communicated somehow through the workflow logs, and would have saved a lot of time spent debugging.

anthony-c-martin avatar Sep 22 '22 13:09 anthony-c-martin

If links to repos are still useful, here's a public action that failed with this specific error: https://github.com/SuffolkLITLab/ALActions/actions/runs/3913807100, running on ubuntu-latest.

It's a really lightweight action that only runs ~20 lines of beautiful soup python on a small webpage, and normally finishes in < 30 seconds, so I'm fairly confident that it wouldn't be eating up memory or using a lot of CPU. The latest failing job took 26 minutes, but there aren't any logs to see what it was doing in that time.

BryceStevenWilley avatar Jan 13 '23 18:01 BryceStevenWilley

what is the status in here? we have encountered similar problems, for more information, see #7004

enjoy-binbin avatar Feb 01 '23 08:02 enjoy-binbin

+1 (ubuntu) One thing that would likely be useful would be to have a way to retrieve missing logs (see @anthony-c-martin comment above).

prein avatar Feb 28 '23 09:02 prein

Due to the fact that virtually all of the cases mentioned here are related to resource consumption above what is possible, I am forced to close this request.

About logs: It is not possible for the moment to publish provisioner logs due to a pack of reasons including security reasons.

For the curious and new arrivals: I recommend paying attention to the discussion with a lot of information from users who encounter the same problem for various reasons: https://github.com/actions/runner-images/discussions/7188.

erik-bershel avatar Apr 06 '23 08:04 erik-bershel