beam icon indicating copy to clipboard operation
beam copied to clipboard

The PostCommit XVR Direct job is flaky

Open github-actions[bot] opened this issue 1 year ago • 1 comments

The PostCommit XVR Direct is failing over 50% of the time Please visit https://github.com/apache/beam/actions/workflows/beam_PostCommit_XVR_Direct.yml?query=is%3Afailure+branch%3Amaster to see the logs.

github-actions[bot] avatar Mar 05 '24 18:03 github-actions[bot]

Reopening since the workflow is still flaky

github-actions[bot] avatar Aug 21 '24 09:08 github-actions[bot]

Seems to be permared for a while.

2024-08-29T17:46:16.8649631Z System Go installation: /usr/local/go/bin/go is go version go1.21.0 linux/amd64; Preparing to use /home/runner/go/bin/go1.22.5
2024-08-29T17:46:17.0648275Z go1.22.5: already downloaded in /home/runner/sdk/go1.22.5
2024-08-29T17:46:17.0665947Z /home/runner/go/bin/go1.22.5 test -v ./test/integration/xlang ./test/integration/io/xlang/... -p 3 -v -timeout 3h --runner=portable --project=apache-beam-testing --region=us-central1 --environment_type=DOCKER --environment_config=apache/beam_go_sdk:dev --staging_location=gs://temp-storage-for-end-to-end-tests/staging-validatesrunner-test/test10288 --temp_location=gs://temp-storage-for-end-to-end-tests/temp-validatesrunner-test/test10288 --endpoint=localhost:34069 --kafka_jar=/runner/_work/beam/beam/sdks/java/testing/kafka-service/build/libs/beam-sdks-java-testing-kafka-service-testKafkaService-2.60.0-SNAPSHOT.jar --expansion_jar=io:/runner/_work/beam/beam/sdks/java/io/expansion-service/build/libs/beam-sdks-java-io-expansion-service-2.60.0-SNAPSHOT.jar --expansion_jar=schemaio:/runner/_work/beam/beam/sdks/java/extensions/schemaio-expansion-service/build/libs/beam-sdks-java-extensions-schemaio-expansion-service-2.60.0-SNAPSHOT.jar --expansion_jar=debeziumio:/runner/_work/beam/beam/sdks/java/io/debezium/expansion-service/build/libs/beam-sdks-java-io-debezium-expansion-service-2.60.0-SNAPSHOT.jar --expansion_jar=gcpio:/runner/_work/beam/beam/sdks/java/io/google-cloud-platform/expansion-service/build/libs/beam-sdks-java-io-google-cloud-platform-expansion-service-2.60.0-SNAPSHOT.jar --bq_dataset=apache-beam-testing.beam_bigquery_io_test_temp --bt_instance=projects/apache-beam-testing/instances/beam-test --expansion_addr=test:localhost:39707
2024-08-29T17:46:17.0689704Z go: downloading cloud.google.com/go/bigtable v1.29.0
2024-08-29T17:46:17.0691189Z go: downloading github.com/lib/pq v1.10.9
2024-08-29T17:46:17.0693048Z go: downloading github.com/go-sql-driver/mysql v1.8.1
2024-08-29T17:46:17.1648532Z go: downloading filippo.io/edwards25519 v1.1.0
2024-08-29T17:46:17.2648855Z go: downloading go.opentelemetry.io/otel/sdk/metric v1.24.0
2024-08-29T17:46:17.2650985Z go: downloading cloud.google.com/go/monitoring v1.20.3
2024-08-29T17:46:17.2652702Z go: downloading go.opentelemetry.io/otel/sdk v1.24.0
2024-08-29T19:32:10.7892423Z ##[error]The operation was canceled.
2024-08-29T19:32:10.8228117Z ##[group]Run actions/upload-artifact@v4
2024-08-29T19:32:10.8229144Z with:
2024-08-29T19:32:10.8230291Z   name: JUnit Test Results

tvalentyn avatar Aug 29 '24 22:08 tvalentyn

looks like we have an xlang test that runs with a 3hr time limit, passes on 3.12, fails on 3.8 after timing out after 2.5 hrs

tvalentyn avatar Aug 29 '24 22:08 tvalentyn

The failing test is GoUsingJava xlang suite, it is not using Python ; test passes on Python 3.12 because the 3.12 suite excludes the GoUsingJava xlang variant since we only need to run it for one Python version. It appears that GoUsingJava xlang scenario not working on some runners is a known issue. cc: @Abacn @lostluck who can correct me if they disagree with the assessment.

tvalentyn avatar Sep 10 '24 18:09 tvalentyn

It's a known issue and it's also not a release blocker. The fact is we have spent very little time making Xlang for go robust and the people tasked with that move on. This is also not something that would be common for users, since they'd need to manually spin up the Python Portable runner.

lostluck avatar Sep 10 '24 19:09 lostluck

last time I checked this it was a few failing xlang tests, and now it's timing out, likely new issues accumulated, which is common for long permared tests unfortunately.

For the same reason agree to disable gousingjava part of the test, so other tasks can still be monitored

Abacn avatar Sep 11 '24 00:09 Abacn