grails-core icon indicating copy to clipboard operation
grails-core copied to clipboard

Retry Build and Functional Test Steps when timeout occurs

Open jamesfredley opened this issue 6 months ago • 4 comments

Attempts to address: Connect to repository.apache.org:443 [repository.apache.org/65.109.119.155] failed: Connect timed out

max_attempts: 3
retry_wait_seconds: 600

nick-fields/retry is already used in publish

image

image

jamesfredley avatar Jun 18 '25 01:06 jamesfredley

So, if I understand this correctly, this will retry the step for any failure, not just timeouts? Like, if there is a genuine test failure, it will still wait 10 minutes and then retry?

matrei avatar Jun 18 '25 11:06 matrei

So it's my understanding that RAO will be back to normal within the week (https://the-asf.slack.com/archives/CBX4TSBQ8/p1749838588937849). These issues are temporary (but painful https://the-asf.slack.com/archives/CBX4TSBQ8/p1750077904096649). I think we should hold off on this and continue to manually retry. Auto retrying will likely lead to more issues.

jdaugherty avatar Jun 18 '25 11:06 jdaugherty

nick-fields/retry does support retry_on, although I'm not confident it is passed these failures in a way that will match up.

retry_on Optional Event to retry on. Currently supports [any (default), timeout, error].

Every CI run failing is very painful at present.

jamesfredley avatar Jun 18 '25 13:06 jamesfredley

Interestingly on https://github.com/apache/grails-core/actions/runs/15731765021/job/44334333078?pr=14821 there were 0 timeouts on the first attempt, which is the first time I have seen that in a while.

jamesfredley avatar Jun 18 '25 13:06 jamesfredley