DataflowTemplates icon indicating copy to clipboard operation
DataflowTemplates copied to clipboard

[Do not merge] Beam 2.56.0rc2 validation

Open Abacn opened this issue 1 year ago • 7 comments

Abacn avatar Apr 29 '24 21:04 Abacn

dataplex unit test compile failed due to https://github.com/googleapis/java-bigquery/pull/3130 removed EmptyTableResult

Abacn avatar Apr 30 '24 13:04 Abacn

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 40.55%. Comparing base (499954c) to head (7154d4f). Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1487      +/-   ##
============================================
+ Coverage     40.33%   40.55%   +0.21%     
- Complexity     2729     2775      +46     
============================================
  Files           727      735       +8     
  Lines         41531    41759     +228     
  Branches       4466     4501      +35     
============================================
+ Hits          16751    16934     +183     
- Misses        23319    23349      +30     
- Partials       1461     1476      +15     
Components Coverage Δ
spanner-templates 56.14% <ø> (+0.43%) :arrow_up:
spanner-import-export 65.61% <ø> (+0.02%) :arrow_up:
spanner-live-forward-migration 61.23% <ø> (+0.03%) :arrow_up:
spanner-live-reverse-replication 42.53% <ø> (+0.02%) :arrow_up:
spanner-bulk-migration 70.55% <ø> (+1.12%) :arrow_up:

see 18 files with indirect coverage changes

codecov[bot] avatar Apr 30 '24 14:04 codecov[bot]

yaml template job launch failing with

docker: Error response from daemon: manifest for gcr.io/cloud-teleport-testing/2024-04-30-14-32-07_it/yaml-template:latest not found: manifest unknown: Failed to fetch "latest" from request "/v2/cloud-teleport-testing/2024-04-30-14-32-07_it/yaml-template/manifests/latest"

2024-04-30 10:38:08.101 EDT
cloudservice.service: Start request repeated too quickly.
2024-04-30 10:38:08.101 EDT
cloudservice.service: Failed with result 'exit-code'.
2024-04-30 10:38:08.101 EDT
Failed to start Template launcher Docker container.

also, retry seems not configured correctly?

Abacn avatar Apr 30 '24 18:04 Abacn

python udf tests failing with

"PullImage from image service failed" err="rpc error: code = Unknown desc = Error response from daemon: manifest for gcr.io/cloud-dataflow/v1beta3/beam_python3.11_sdk:2.56.0rc2 not found: manifest unknown: Failed to fetch \"2.56.0rc2\" from request \"/v2/cloud-dataflow/v1beta3/beam_python3.11_sdk/manifests/2.56.0rc2\"." image="gcr.io/cloud-dataflow/v1beta3/beam_python3.11_sdk:2.56.0rc2"

this is due to I set rc python version to requitements.txt in f452b6a . Check if removing "rc2" could resolve the issue, otherwise PythonUDF tests cannot run on RC currently.

Abacn avatar Apr 30 '24 22:04 Abacn

remove "rc2" suffix now Python UDF integration tests failed with the same vague message as the Yaml template integration tests. https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/1487#issuecomment-2086566584 , which implies yaml template test failing also due to cannot install "apache-beam==2.56.0", other than Yaml and Python UDF tests, all tests passed on the latest run.

Abacn avatar May 01 '24 01:05 Abacn

In summary, the current issue is

For Python UDF integration tests:

If I set Python version to 2.56.0rc2, it cannot find dataflow worker image as the gcr.io image label does not have "rc", pipeline stucks at initialize harness; if I set Python version to 2.56.0, it falls to install apache-beam from PyPI, and pipeline launch failure.

For Yaml Template integration test: I assume the test requests Beam Python SDK version 2.56.0, which is not available on PyPI yet.

For these reason I wasn't able to validate Beam release candidate for Yaml and Python UDF integration tests

cc: @Polber @fozzie15

Abacn avatar May 01 '24 01:05 Abacn

For Yaml Template integration test: I assume the test requests Beam Python SDK version 2.56.0, which is not available on PyPI yet.

YamlTemplate is a bit tricky to test (hopefully I can continue to improve the process). You need to change 3 things...

  1. Change the beam version in YamlDockerfileGenerator https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/dd5a13f99c2e9a4b048dffdf555cb9622dff2ef5/plugins/core-plugin/src/main/java/com/google/cloud/teleport/plugin/YamlDockerfileGenerator.java#L56 parameters.put("beamVersion", "2.56.0rc2");

  2. Since xlang jars are in custom maven repo, this needs to be passed as provider in the YAML file used by YamlTemplateIT, https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/dd5a13f99c2e9a4b048dffdf555cb9622dff2ef5/python/src/test/resources/YamlTemplateIT.yaml, i.e.

providers:
  - type: mavenJar
    config:
      artifact_id: beam-sdks-java-extensions-schemaio-expansion-service
      group_id: org.apache.beam
      version: 2.56.0
      repository: "https://repository.apache.org/content/repositories/orgapachebeam-1377"
    transforms:
       'MapToFieldsCustom': 'beam:schematransform:org.apache.beam:yaml:map_to_fields-java:v1'
       'WriteToJsonCustom': 'beam:schematransform:org.apache.beam:json_write:v1'

and change the associated transforms https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/dd5a13f99c2e9a4b048dffdf555cb9622dff2ef5/python/src/test/resources/YamlTemplateIT.yaml#L35-L54 with the custom-provided transforms

    - type: MapToFieldsCustom
      name: Sum
      input: Filter
      config:
        language: java
        append: true
        drop: [str]
        fields:
          sum:
            expression: num + inverse
    - type: WriteToJsonCustom
      name: WriteGoodFiles
      input: Sum
      config:
        path: "OUTPUT_PATH/good"

I should have been on top of validating the release anyway, so moving forward, definitely feel free to tag me to take this on. For now, I was able to validate using those steps above, so YamlTemplate works as expected with 2.56.0

Polber avatar May 01 '24 02:05 Polber