[Do not merge] Beam 2.56.0rc2 validation
dataplex unit test compile failed due to https://github.com/googleapis/java-bigquery/pull/3130 removed EmptyTableResult
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 40.55%. Comparing base (
499954c) to head (7154d4f). Report is 2 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1487 +/- ##
============================================
+ Coverage 40.33% 40.55% +0.21%
- Complexity 2729 2775 +46
============================================
Files 727 735 +8
Lines 41531 41759 +228
Branches 4466 4501 +35
============================================
+ Hits 16751 16934 +183
- Misses 23319 23349 +30
- Partials 1461 1476 +15
| Components | Coverage Δ | |
|---|---|---|
| spanner-templates | 56.14% <ø> (+0.43%) |
:arrow_up: |
| spanner-import-export | 65.61% <ø> (+0.02%) |
:arrow_up: |
| spanner-live-forward-migration | 61.23% <ø> (+0.03%) |
:arrow_up: |
| spanner-live-reverse-replication | 42.53% <ø> (+0.02%) |
:arrow_up: |
| spanner-bulk-migration | 70.55% <ø> (+1.12%) |
:arrow_up: |
yaml template job launch failing with
docker: Error response from daemon: manifest for gcr.io/cloud-teleport-testing/2024-04-30-14-32-07_it/yaml-template:latest not found: manifest unknown: Failed to fetch "latest" from request "/v2/cloud-teleport-testing/2024-04-30-14-32-07_it/yaml-template/manifests/latest"
2024-04-30 10:38:08.101 EDT
cloudservice.service: Start request repeated too quickly.
2024-04-30 10:38:08.101 EDT
cloudservice.service: Failed with result 'exit-code'.
2024-04-30 10:38:08.101 EDT
Failed to start Template launcher Docker container.
also, retry seems not configured correctly?
python udf tests failing with
"PullImage from image service failed" err="rpc error: code = Unknown desc = Error response from daemon: manifest for gcr.io/cloud-dataflow/v1beta3/beam_python3.11_sdk:2.56.0rc2 not found: manifest unknown: Failed to fetch \"2.56.0rc2\" from request \"/v2/cloud-dataflow/v1beta3/beam_python3.11_sdk/manifests/2.56.0rc2\"." image="gcr.io/cloud-dataflow/v1beta3/beam_python3.11_sdk:2.56.0rc2"
this is due to I set rc python version to requitements.txt in f452b6a . Check if removing "rc2" could resolve the issue, otherwise PythonUDF tests cannot run on RC currently.
remove "rc2" suffix now Python UDF integration tests failed with the same vague message as the Yaml template integration tests. https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/1487#issuecomment-2086566584 , which implies yaml template test failing also due to cannot install "apache-beam==2.56.0", other than Yaml and Python UDF tests, all tests passed on the latest run.
In summary, the current issue is
For Python UDF integration tests:
If I set Python version to 2.56.0rc2, it cannot find dataflow worker image as the gcr.io image label does not have "rc", pipeline stucks at initialize harness; if I set Python version to 2.56.0, it falls to install apache-beam from PyPI, and pipeline launch failure.
For Yaml Template integration test: I assume the test requests Beam Python SDK version 2.56.0, which is not available on PyPI yet.
For these reason I wasn't able to validate Beam release candidate for Yaml and Python UDF integration tests
cc: @Polber @fozzie15
For Yaml Template integration test: I assume the test requests Beam Python SDK version 2.56.0, which is not available on PyPI yet.
YamlTemplate is a bit tricky to test (hopefully I can continue to improve the process). You need to change 3 things...
-
Change the beam version in
YamlDockerfileGeneratorhttps://github.com/GoogleCloudPlatform/DataflowTemplates/blob/dd5a13f99c2e9a4b048dffdf555cb9622dff2ef5/plugins/core-plugin/src/main/java/com/google/cloud/teleport/plugin/YamlDockerfileGenerator.java#L56parameters.put("beamVersion", "2.56.0rc2"); -
Since xlang jars are in custom maven repo, this needs to be passed as provider in the YAML file used by
YamlTemplateIT, https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/dd5a13f99c2e9a4b048dffdf555cb9622dff2ef5/python/src/test/resources/YamlTemplateIT.yaml, i.e.
providers:
- type: mavenJar
config:
artifact_id: beam-sdks-java-extensions-schemaio-expansion-service
group_id: org.apache.beam
version: 2.56.0
repository: "https://repository.apache.org/content/repositories/orgapachebeam-1377"
transforms:
'MapToFieldsCustom': 'beam:schematransform:org.apache.beam:yaml:map_to_fields-java:v1'
'WriteToJsonCustom': 'beam:schematransform:org.apache.beam:json_write:v1'
and change the associated transforms https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/dd5a13f99c2e9a4b048dffdf555cb9622dff2ef5/python/src/test/resources/YamlTemplateIT.yaml#L35-L54 with the custom-provided transforms
- type: MapToFieldsCustom
name: Sum
input: Filter
config:
language: java
append: true
drop: [str]
fields:
sum:
expression: num + inverse
- type: WriteToJsonCustom
name: WriteGoodFiles
input: Sum
config:
path: "OUTPUT_PATH/good"
I should have been on top of validating the release anyway, so moving forward, definitely feel free to tag me to take this on. For now, I was able to validate using those steps above, so YamlTemplate works as expected with 2.56.0