beam icon indicating copy to clipboard operation
beam copied to clipboard

[Failing Test]: PostRelease Nightly Snapshot is perma-red due to "connect timed out"

Open chamikaramj opened this issue 1 year ago • 2 comments

What happened?

Seems like jar file resolution with Maven SNAPSHOT repo is somehow incorrect.

Successful run: https://github.com/apache/beam/actions/runs/11307102323/job/31448448902

[INFO] Archetype repository not defined. Using the one from [org.apache.beam:beam-sdks-java-maven-archetypes-examples:2.24.0] found in catalog remote
[INFO] Downloading from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/maven-metadata.xml
[INFO] Downloaded from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/maven-metadata.xml (1.7 kB at 3.3 kB/s)
[INFO] Downloading from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/beam-sdks-java-maven-archetypes-examples-2.61.0-20241012.122627-12.jar
[INFO] Downloaded from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/beam-sdks-java-maven-archetypes-examples-2.61.0-20241012.122627-12.jar (340 kB at 435 kB/s)

Failed run: https://github.com/apache/beam/actions/runs/11369820828/job/31628153369

[INFO] Archetype repository not defined. Using the one from [org.apache.beam:beam-sdks-java-maven-archetypes-examples:2.24.0] found in catalog remote
[INFO] Downloading from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/maven-metadata.xml
Warning:  Could not transfer metadata org.apache.beam:beam-sdks-java-maven-archetypes-examples:2.61.0-SNAPSHOT/maven-metadata.xml from/to test.release (https://repository.apache.org/content/repositories/snapshots): Connect to repository.apache.org:443 [repository.apache.org/65.109.119.155] failed: connect timed out
[INFO] Downloading from test.release: https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-maven-archetypes-examples/2.61.0-SNAPSHOT/beam-sdks-java-maven-archetypes-examples-2.61.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------

Issue Failure

Failure: Test is flaky

Issue Priority

Priority: 1 (unhealthy code / failing or flaky postcommit so we cannot be sure the product is healthy)

Issue Components

  • [ ] Component: Python SDK
  • [X] Component: Java SDK
  • [ ] Component: Go SDK
  • [ ] Component: Typescript SDK
  • [ ] Component: IO connector
  • [ ] Component: Beam YAML
  • [ ] Component: Beam examples
  • [ ] Component: Beam playground
  • [ ] Component: Beam katas
  • [ ] Component: Website
  • [ ] Component: Infrastructure
  • [ ] Component: Spark Runner
  • [ ] Component: Flink Runner
  • [ ] Component: Samza Runner
  • [ ] Component: Twister2 Runner
  • [ ] Component: Hazelcast Jet Runner
  • [ ] Component: Google Cloud Dataflow Runner

chamikaramj avatar Oct 16 '24 21:10 chamikaramj

cc: @Abacn

chamikaramj avatar Oct 16 '24 21:10 chamikaramj

This warning message tells the reason:

Warning:  org.apache.beam:beam-runners-portability-java:2.61.0-SNAPSHOT/maven-metadata.xml failed to transfer from
https://repository.apache.org/content/repositories/snapshots during a previous attempt. This failure was cached in the
local repository and resolution will not be reattempted until the update interval of test.release has elapsed or updates are
forced. Original error: Could not transfer metadata org.apache.beam:beam-runners-portability-java:2.61.0-SNAPSHOT/maven-metadata.xml
from/to test.release (https://repository.apache.org/content/repositories/snapshots): Connect to
repository.apache.org:443 [repository.apache.org/65.109.119.155] failed: connect timed out

In particular,

This failure was cached in the local repository and resolution will not be reattempted until the update interval of test.release has elapsed or updates are forced.

Did some search: https://stackoverflow.com/questions/4856307/when-maven-says-resolution-will-not-be-reattempted-until-the-update-interval-of

what happens appear to be there was a run introduced bad local artifact, then in next a few days same artifact stays bad, until the "update interval" elapsed.

Noticed this also happened 3 weeks ago and after 6 run it recovered itself.

An action could be add "-U" flag in the mvn invocation.

Abacn avatar Oct 17 '24 14:10 Abacn

Seems like this is still broken after https://github.com/apache/beam/pull/32841.

@Abacn can you check ?

chamikaramj avatar Oct 21 '24 22:10 chamikaramj

The error is (from the most recent run):

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-archetype-plugin:3.3.0:generate (default-cli) on project standalone-pom: The desired archetype does not exist (org.apache.beam:beam-sdks-java-maven-archetypes-gcp-bom-examples:2.61.0-SNAPSHOT) -> [Help 1] |  
-- | --
  | [ERROR] |  
  | [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. |  
  | [ERROR] Re-run Maven using the -X switch to enable full debug logging. |  
  | [ERROR]

https://ge.apache.org/s/73edbo72xzrt4/console-log/task/:runners:google-cloud-dataflow-java:runMobileGamingJavaDataflowBom?page=1#L90

chamikaramj avatar Oct 21 '24 23:10 chamikaramj

the number of failed task improved but still very flaky. The latest run fails one task.

This error is because client failed to fetch maven-metadata.xml for beam-sdks-java-maven-archetypes-gcp-bom-examples:2.61.0-SNAPSHOT from snapshot repo, it falls back to maven central, where 2.61.0-SNAPSHOT does not exist.


In general it seems network condition changed recently and the availability of maven snapshot repo becomes worse than before. #32841 mitigated the issue (originally 8/8 task fail) but it still highly flaky (1-5 out of 8 task fail)

Abacn avatar Oct 22 '24 02:10 Abacn

Is it possible to push SNAPSHOTs to somewhere else ? For example, to a GCP Artifact Registry repository.

chamikaramj avatar Oct 22 '24 06:10 chamikaramj

Might be good to check with Apache INFRA first regarding why the SNAPSHOT repo became unstable.

chamikaramj avatar Oct 22 '24 17:10 chamikaramj

Opened https://issues.apache.org/jira/projects/INFRA/issues/INFRA-26230?filter=allopenissues

For example, to a GCP Artifact Registry repository.

In theory we can, just need to invest in migrations.

Abacn avatar Oct 22 '24 18:10 Abacn

It is caused by maven service side issue that is resolved for now

Abacn avatar Oct 28 '24 18:10 Abacn