[Failing Test]: PostRelease Nightly Snapshot is perma-red due to "connect timed out"
What happened?
Seems like jar file resolution with Maven SNAPSHOT repo is somehow incorrect.
Successful run: https://github.com/apache/beam/actions/runs/11307102323/job/31448448902
[INFO] Archetype repository not defined. Using the one from [org.apache.beam:beam-sdks-java-maven-archetypes-examples:2.24.0] found in catalog remote
[INFO] Downloading from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/maven-metadata.xml
[INFO] Downloaded from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/maven-metadata.xml (1.7 kB at 3.3 kB/s)
[INFO] Downloading from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/beam-sdks-java-maven-archetypes-examples-2.61.0-20241012.122627-12.jar
[INFO] Downloaded from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/beam-sdks-java-maven-archetypes-examples-2.61.0-20241012.122627-12.jar (340 kB at 435 kB/s)
Failed run: https://github.com/apache/beam/actions/runs/11369820828/job/31628153369
[INFO] Archetype repository not defined. Using the one from [org.apache.beam:beam-sdks-java-maven-archetypes-examples:2.24.0] found in catalog remote
[INFO] Downloading from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/maven-metadata.xml
Warning: Could not transfer metadata org.apache.beam:beam-sdks-java-maven-archetypes-examples:2.61.0-SNAPSHOT/maven-metadata.xml from/to test.release (https://repository.apache.org/content/repositories/snapshots): Connect to repository.apache.org:443 [repository.apache.org/65.109.119.155] failed: connect timed out
[INFO] Downloading from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/beam-sdks-java-maven-archetypes-examples-2.61.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
Issue Failure
Failure: Test is flaky
Issue Priority
Priority: 1 (unhealthy code / failing or flaky postcommit so we cannot be sure the product is healthy)
Issue Components
- [ ] Component: Python SDK
- [X] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
cc: @Abacn
This warning message tells the reason:
Warning: org.apache.beam:beam-runners-portability-java:2.61.0-SNAPSHOT/maven-metadata.xml failed to transfer from
https://repository.apache.org/content/repositories/snapshots during a previous attempt. This failure was cached in the
local repository and resolution will not be reattempted until the update interval of test.release has elapsed or updates are
forced. Original error: Could not transfer metadata org.apache.beam:beam-runners-portability-java:2.61.0-SNAPSHOT/maven-metadata.xml
from/to test.release (https://repository.apache.org/content/repositories/snapshots): Connect to
repository.apache.org:443 [repository.apache.org/65.109.119.155] failed: connect timed out
In particular,
This failure was cached in the local repository and resolution will not be reattempted until the update interval of test.release has elapsed or updates are forced.
Did some search: https://stackoverflow.com/questions/4856307/when-maven-says-resolution-will-not-be-reattempted-until-the-update-interval-of
what happens appear to be there was a run introduced bad local artifact, then in next a few days same artifact stays bad, until the "update interval" elapsed.
Noticed this also happened 3 weeks ago and after 6 run it recovered itself.
An action could be add "-U" flag in the mvn invocation.
Seems like this is still broken after https://github.com/apache/beam/pull/32841.
@Abacn can you check ?
The error is (from the most recent run):
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-archetype-plugin:3.3.0:generate (default-cli) on project standalone-pom: The desired archetype does not exist (org.apache.beam:beam-sdks-java-maven-archetypes-gcp-bom-examples:2.61.0-SNAPSHOT) -> [Help 1] |
-- | --
| [ERROR] |
| [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. |
| [ERROR] Re-run Maven using the -X switch to enable full debug logging. |
| [ERROR]
https://ge.apache.org/s/73edbo72xzrt4/console-log/task/:runners:google-cloud-dataflow-java:runMobileGamingJavaDataflowBom?page=1#L90
the number of failed task improved but still very flaky. The latest run fails one task.
This error is because client failed to fetch maven-metadata.xml for beam-sdks-java-maven-archetypes-gcp-bom-examples:2.61.0-SNAPSHOT from snapshot repo, it falls back to maven central, where 2.61.0-SNAPSHOT does not exist.
In general it seems network condition changed recently and the availability of maven snapshot repo becomes worse than before. #32841 mitigated the issue (originally 8/8 task fail) but it still highly flaky (1-5 out of 8 task fail)
Is it possible to push SNAPSHOTs to somewhere else ? For example, to a GCP Artifact Registry repository.
Might be good to check with Apache INFRA first regarding why the SNAPSHOT repo became unstable.
Opened https://issues.apache.org/jira/projects/INFRA/issues/INFRA-26230?filter=allopenissues
For example, to a GCP Artifact Registry repository.
In theory we can, just need to invest in migrations.
It is caused by maven service side issue that is resolved for now