wrapper-validation-action Action fails regularly due to ETIMEDOUT and ECONNRESET

Example runs:

https://github.com/square/anvil/pull/266/checks?check_run_id=2589215352

https://github.com/square/anvil/pull/266/checks?check_run_id=2589215611

I've seen this flakey behavior happen somewhat often in the past few weeks, not sure what else is going on so filing this as an FYI.

May 15 '21 06:05 ZacSweers

This may be the same as #33

Hopefully, #39 having been merged will resolve this. @eskatos can you perform a release to see if that helps resolve this issue for our users?

May 18 '21 15:05 JLLeitschuh

Is there anything else needed for a release that I could maybe help with? This makes most of our workflows unusable

May 25 '21 03:05 ZacSweers

@ZacSweers I believe that you can try out this action from a commit hash. You may want to give that a shot as a stopgap?

May 25 '21 14:05 JLLeitschuh

Using ef08c6885017f258a11d59e0da103ed39424aa6b appears to resolve things for us. I'd recommend a new 1.x release tag to de-flake things for folks, we were definitely considering dropping this otherwise and I'm not sure how willing folks are to point at a direct sha

May 27 '21 18:05 ZacSweers

Should be published now as v1

May 28 '21 14:05 JLLeitschuh

Thanks!

May 28 '21 14:05 ZacSweers

We're still seeing this unfortunately, albeit less often and just as this

Run gradle/wrapper-validation-action@v1
Error: read ECONNRESET

Jun 24 '21 05:06 ZacSweers

Here's an example run https://github.com/ZacSweers/MoshiX/pull/128/checks?check_run_id=2921425588

Jun 26 '21 15:06 ZacSweers

This happens pretty consistently across the projects I work on, unfortunately I think we're going to have to remove this action as a result as it's a reliability issue

Jul 16 '21 02:07 ZacSweers

Unfortunately, we don't have enough information at this time to understand what's causing this issue.

Are you using self-hosted runners, or runners hosted by GH?

Aug 30 '21 14:08 JLLeitschuh

I see this often on GH hosted runners, often in the square/anvil repo

Aug 30 '21 15:08 ZacSweers

@eskatos is there any way to add additional log output on failure so that we can work on understanding the root cause?

Aug 30 '21 17:08 JLLeitschuh

This is the error that I'm getting: Error: connect ETIMEDOUT 104.18.164.99:443

Sep 21 '21 11:09 nkvaratskhelia

Github hosted actions here. When the action fails with this error it fails across all active runs around the same time. About 30 minutes ago 3 runs failed simultaneously. Retried each about 20 minutes ago and they all passed.

Sep 21 '21 13:09 jameswald

Github hosted actions here. When the action fails with this error it fails across all active runs around the same time. About 30 minutes ago 3 runs failed simultaneously. Retried each about 20 minutes ago and they all passed.

That seems like something that absolutely indicates a cloudflare issue.

Sep 22 '21 14:09 JLLeitschuh

Okay, all this finally sent me down the right path, I think I may have finally figured out what's going on here. It looks like our Cloudflare WAF is being triggered every once and awhile randomly and is causing a bunch of users connections to fail when it does. I need to talk to @eskatos about how we want to mitigate this issue. Thanks everyone for helping us figure out what was going wrong here.

Sep 22 '21 14:09 JLLeitschuh

The fix has been implemented.

Please let us know if any of you continue to experience these problems. I hope this will fix the issue, but we have some additional things we can fiddle with if this continues to be a problem.

FOR INTERNAL TRACKING (not public): https://github.com/gradle/gradle-private/issues/3435

Sep 23 '21 17:09 JLLeitschuh

Facing a similar issue. A two-line change to a class causes failures with these actions in the following runs:

https://github.com/AY2122S1-CS2103-T14-2/tp/actions/runs/1386259288/attempts/2 Here, it shows ETIMEDOUT
https://github.com/AY2122S1-CS2103-T14-2/tp/actions/runs/1386259288/attempts/1 Here, it shows Client network socket disconnected before secure TLS connection was established

Oct 26 '21 16:10 jivesh

Seeing the same issue here.

https://github.com/MinimallyCorrect/Mixin/runs/4041503110?check_suite_focus=true

Can the team publish a single file with all the hashes instead of having it fetch hundreds of files with each hash? This is only going to increase in frequency as the number of requests needed goes up with every release.

https://github.com/gradle/wrapper-validation-action/blob/84d7e182ae7c7a37f200c184f64038fb0e62dd7d/src/checksums.ts#L28

Oct 29 '21 00:10 LunNova

It's not possible for us to know what version you have locally, so we have to fetch all of them.

Ill take a look at our Cloudflare logs and see if this is being caused by our infrastructure/firewall. Thanks for the ping 🙂

Oct 29 '21 12:10 JLLeitschuh

Also ran into this right now (and yesterday), re-triggered the job, then it worked:

Run gradle/[email protected]
  with:
    min-wrapper-count: 1
    allow-snapshots: false
Error: Client network socket disconnected before secure TLS connection was established

GitHub hosted action... Let me know if I can provide any more data that helps with this!

Oct 29 '21 16:10 codecholeric

@JLLeitschuh I was thinking adding the checksum inline to https://services.gradle.org/versions/all:

{
  "version" : "7.3-20211027231204+0000",
  "buildTime" : "20211027231204+0000",
  "current" : false,
  "snapshot" : true,
  "nightly" : false,
  "releaseNightly" : true,
  "activeRc" : false,
  "rcFor" : "",
  "milestoneFor" : "",
  "broken" : false,
  "downloadUrl" : "https://services.gradle.org/distributions-snapshots/gradle-7.3-20211027231204+0000-bin.zip",
  "checksumUrl" : "https://services.gradle.org/distributions-snapshots/gradle-7.3-20211027231204+0000-bin.zip.sha256",
  "wrapperChecksumUrl" : "https://services.gradle.org/distributions-snapshots/gradle-7.3-20211027231204+0000-wrapper.jar.sha256",
  "wrapperChecksum": "33ad4583fd7ee156f533778736fa1b4940bd83b433934d1cc4e9f608e99a6a89"
  // (The checksum would actually be shorter than the URL for where to go fetch it. ;))
},

Since the only field that gets used at the moment is the wrapper checksum, it might even be worth making a more specialized endpoint which is just a list of all wrapper checksums.

I have no idea where the code that generates/serves these is.

Oct 29 '21 16:10 LunNova

I've been running into this issue occasionally ever since I integrated this action, but today it's been happening like 60% of the time on macOS on CI (CI also runs on Windows and Linux, but both seem fine).

I recently upgraded to Gradle 7, in case that's relevant.

Nov 05 '21 11:11 gnarea

So, I've checked, and it's not our WAF causing theses issues. I'm not certain what would be causing these issues otherwise.

Nov 05 '21 17:11 JLLeitschuh

Running into the same issue today. Any updates on this?

Nov 14 '21 20:11 nhouser9

also seeing this issue a few times every day, retrying tends to work straight away

2021-11-24T11:22:49,356896107+00:00

https://github.com/vector-im/element-android/actions/workflows/gradle-wrapper-validation.yml?query=is%3Afailure

Nov 24 '21 11:11 ouchadam

Seeing this a lot on Paparazzi builds, mainly with Windows workers Example run: https://github.com/cashapp/paparazzi/runs/4316309670?check_suite_focus=true

Nov 24 '21 20:11 jrodbx

Is there any further update on this as it keeps failing sporadically on both windows-2022 and ubuntu-20.04 action runs.

Dec 04 '21 14:12 The-Code-Monkey

I also keep getting CI failures due to this issue: Error: connect ETIMEDOUT 104.18.165.99:443. This might be silly, but since relaunching usually fixes, I wonder whether just allowing for three retries or so could be helpful. In the end, connection timeouts may happen, unless some destination is unreachable it may make sense not to fail immediately.

Dec 04 '21 22:12 DanySK

We do have retry logic enabled. https://github.com/gradle/wrapper-validation-action/blob/84d7e182ae7c7a37f200c184f64038fb0e62dd7d/src/checksums.ts#L6

That being said, I have no evidence that it's actually working. A PR to improve debug logging from the community would be welcomed openly. Especially if it were implemented such that the additional logging was only printed when the build was going to fail anyways. I'd prefer to not make the action more chatty than it needs to be when it's not going to fail. I think the biggest problem we currently have is a severe lack of visibility. As such it makes it really difficult to figure out a root cause for these issues.

Dec 08 '21 18:12 JLLeitschuh

wrapper-validation-action wrapper-validation-action copied to clipboard

Action fails regularly due to ETIMEDOUT and ECONNRESET

wrapper-validation-action
wrapper-validation-action copied to clipboard