wrapper-validation-action
wrapper-validation-action copied to clipboard
Action fails regularly due to ETIMEDOUT and ECONNRESET
Example runs:
https://github.com/square/anvil/pull/266/checks?check_run_id=2589215352
https://github.com/square/anvil/pull/266/checks?check_run_id=2589215611
I've seen this flakey behavior happen somewhat often in the past few weeks, not sure what else is going on so filing this as an FYI.
This may be the same as #33
Hopefully, #39 having been merged will resolve this. @eskatos can you perform a release to see if that helps resolve this issue for our users?
Is there anything else needed for a release that I could maybe help with? This makes most of our workflows unusable
@ZacSweers I believe that you can try out this action from a commit hash. You may want to give that a shot as a stopgap?
Using ef08c6885017f258a11d59e0da103ed39424aa6b appears to resolve things for us. I'd recommend a new 1.x release tag to de-flake things for folks, we were definitely considering dropping this otherwise and I'm not sure how willing folks are to point at a direct sha
Should be published now as v1
Thanks!
We're still seeing this unfortunately, albeit less often and just as this
Run gradle/wrapper-validation-action@v1
Error: read ECONNRESET
Here's an example run https://github.com/ZacSweers/MoshiX/pull/128/checks?check_run_id=2921425588
This happens pretty consistently across the projects I work on, unfortunately I think we're going to have to remove this action as a result as it's a reliability issue
Unfortunately, we don't have enough information at this time to understand what's causing this issue.
Are you using self-hosted runners, or runners hosted by GH?
I see this often on GH hosted runners, often in the square/anvil repo
@eskatos is there any way to add additional log output on failure so that we can work on understanding the root cause?
This is the error that I'm getting:
Error: connect ETIMEDOUT 104.18.164.99:443
Github hosted actions here. When the action fails with this error it fails across all active runs around the same time. About 30 minutes ago 3 runs failed simultaneously. Retried each about 20 minutes ago and they all passed.
Github hosted actions here. When the action fails with this error it fails across all active runs around the same time. About 30 minutes ago 3 runs failed simultaneously. Retried each about 20 minutes ago and they all passed.
That seems like something that absolutely indicates a cloudflare issue.
Okay, all this finally sent me down the right path, I think I may have finally figured out what's going on here. It looks like our Cloudflare WAF is being triggered every once and awhile randomly and is causing a bunch of users connections to fail when it does. I need to talk to @eskatos about how we want to mitigate this issue. Thanks everyone for helping us figure out what was going wrong here.

The fix has been implemented.
Please let us know if any of you continue to experience these problems. I hope this will fix the issue, but we have some additional things we can fiddle with if this continues to be a problem.
FOR INTERNAL TRACKING (not public): https://github.com/gradle/gradle-private/issues/3435
Facing a similar issue. A two-line change to a class causes failures with these actions in the following runs:
- https://github.com/AY2122S1-CS2103-T14-2/tp/actions/runs/1386259288/attempts/2 Here, it shows ETIMEDOUT
- https://github.com/AY2122S1-CS2103-T14-2/tp/actions/runs/1386259288/attempts/1 Here, it shows
Client network socket disconnected before secure TLS connection was established
Seeing the same issue here.
https://github.com/MinimallyCorrect/Mixin/runs/4041503110?check_suite_focus=true
Can the team publish a single file with all the hashes instead of having it fetch hundreds of files with each hash? This is only going to increase in frequency as the number of requests needed goes up with every release.
https://github.com/gradle/wrapper-validation-action/blob/84d7e182ae7c7a37f200c184f64038fb0e62dd7d/src/checksums.ts#L28
It's not possible for us to know what version you have locally, so we have to fetch all of them.
Ill take a look at our Cloudflare logs and see if this is being caused by our infrastructure/firewall. Thanks for the ping 🙂
Also ran into this right now (and yesterday), re-triggered the job, then it worked:
Run gradle/[email protected]
with:
min-wrapper-count: 1
allow-snapshots: false
Error: Client network socket disconnected before secure TLS connection was established
GitHub hosted action... Let me know if I can provide any more data that helps with this!
@JLLeitschuh I was thinking adding the checksum inline to https://services.gradle.org/versions/all
:
{
"version" : "7.3-20211027231204+0000",
"buildTime" : "20211027231204+0000",
"current" : false,
"snapshot" : true,
"nightly" : false,
"releaseNightly" : true,
"activeRc" : false,
"rcFor" : "",
"milestoneFor" : "",
"broken" : false,
"downloadUrl" : "https://services.gradle.org/distributions-snapshots/gradle-7.3-20211027231204+0000-bin.zip",
"checksumUrl" : "https://services.gradle.org/distributions-snapshots/gradle-7.3-20211027231204+0000-bin.zip.sha256",
"wrapperChecksumUrl" : "https://services.gradle.org/distributions-snapshots/gradle-7.3-20211027231204+0000-wrapper.jar.sha256",
"wrapperChecksum": "33ad4583fd7ee156f533778736fa1b4940bd83b433934d1cc4e9f608e99a6a89"
// (The checksum would actually be shorter than the URL for where to go fetch it. ;))
},
Since the only field that gets used at the moment is the wrapper checksum, it might even be worth making a more specialized endpoint which is just a list of all wrapper checksums.
I have no idea where the code that generates/serves these is.
I've been running into this issue occasionally ever since I integrated this action, but today it's been happening like 60% of the time on macOS on CI (CI also runs on Windows and Linux, but both seem fine).
I recently upgraded to Gradle 7, in case that's relevant.
So, I've checked, and it's not our WAF causing theses issues. I'm not certain what would be causing these issues otherwise.
Running into the same issue today. Any updates on this?
also seeing this issue a few times every day, retrying tends to work straight away
https://github.com/vector-im/element-android/actions/workflows/gradle-wrapper-validation.yml?query=is%3Afailure
Seeing this a lot on Paparazzi builds, mainly with Windows workers Example run: https://github.com/cashapp/paparazzi/runs/4316309670?check_suite_focus=true
Is there any further update on this as it keeps failing sporadically on both windows-2022 and ubuntu-20.04 action runs.
I also keep getting CI failures due to this issue: Error: connect ETIMEDOUT 104.18.165.99:443
.
This might be silly, but since relaunching usually fixes, I wonder whether just allowing for three retries or so could be helpful. In the end, connection timeouts may happen, unless some destination is unreachable it may make sense not to fail immediately.
We do have retry logic enabled. https://github.com/gradle/wrapper-validation-action/blob/84d7e182ae7c7a37f200c184f64038fb0e62dd7d/src/checksums.ts#L6
That being said, I have no evidence that it's actually working. A PR to improve debug logging from the community would be welcomed openly. Especially if it were implemented such that the additional logging was only printed when the build was going to fail anyways. I'd prefer to not make the action more chatty than it needs to be when it's not going to fail. I think the biggest problem we currently have is a severe lack of visibility. As such it makes it really difficult to figure out a root cause for these issues.