beam icon indicating copy to clipboard operation
beam copied to clipboard

[Bug]: Iceberg managed connector upgrade failed in GCP Dataflow

Open marianobrc opened this issue 3 months ago • 17 comments

What happened?

We are getting an error running our Apache Beam pipeline in Dataflow (v2 runners) using the managed transform for iceberg in python (cross-language).

The dataflow job fails to start with error:

Dataflow is unable to process the managed transform(s) ref_AppliedPTransform_WriteOutput-SchemaAwareExternalTransform-beam-transform-managed-v1_10(beam:schematransform:org.apache.beam:iceberg_write:v1) due to an internal error. Because the upgrade failed, the new job or the update request has been rejected. To troubleshoot, see https://cloud.google.com/dataflow/docs/guides/common-errors#managed-transforms-upgrade.

We are using Python 3.12, Apache beam SDK 2.67.0. The Iceberg connector we are using is: https://cloud.google.com/dataflow/docs/guides/managed-io-iceberg

We are using the Java version in Python through the managed write transforms like this:

apache_beam.managed.Write(beam.managed.ICEBERG, config=iceberg_config)

The pipeline runs fine without the connector, but fails early on the job setup stage if added. No pipeline graph or worker logs are generated at this stage.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • [x] Component: Python SDK
  • [ ] Component: Java SDK
  • [ ] Component: Go SDK
  • [ ] Component: Typescript SDK
  • [ ] Component: IO connector
  • [ ] Component: Beam YAML
  • [ ] Component: Beam examples
  • [ ] Component: Beam playground
  • [ ] Component: Beam katas
  • [ ] Component: Website
  • [ ] Component: Infrastructure
  • [ ] Component: Spark Runner
  • [ ] Component: Flink Runner
  • [ ] Component: Samza Runner
  • [ ] Component: Twister2 Runner
  • [ ] Component: Hazelcast Jet Runner
  • [x] Component: Google Cloud Dataflow Runner

marianobrc avatar Sep 30 '25 20:09 marianobrc

Image

marianobrc avatar Sep 30 '25 20:09 marianobrc

@alxmrs @TheNeuralBit my colleague @marianobrc is hitting the above error trying to write to Iceberg using the x-language managed connector. This seems like it should be supported and we're a bit at a loss for next steps. Tagging you in case you happen to know, or know someone who knows, what may be happening here. 🙏

cisaacstern avatar Sep 30 '25 20:09 cisaacstern

@ahmedabu98 @chamikaramj

liferoad avatar Oct 01 '25 17:10 liferoad

Thanks for filing this.

Can you check if the the Dataflow Log Explorer logs managed-transforms-worker and managed-transforms-worker-startup mentioned in [1] has additional information regarding the error ?

Also, can you mention your Iceberg config ?

If you can file a Google Cloud support ticket, the support team take a look at your exact job.

[1] https://cloud.google.com/dataflow/docs/guides/common-errors#managed-transforms-upgrade

chamikaramj avatar Oct 01 '25 17:10 chamikaramj

Thanks for filing this.

Can you check if the the Dataflow Log Explorer logs managed-transforms-worker and managed-transforms-worker-startup mentioned in [1] has additional information regarding the error ?

Also, can you mention your Iceberg config ?

If you can file a Google Cloud support ticket, the support team take a look at your exact job.

[1] https://cloud.google.com/dataflow/docs/guides/common-errors#managed-transforms-upgrade

Thanks for looking into this @chamikaramj The iceberg config is:

iceberg_config = {
    "table": "earthranger_catalog.warehouse.observations",
    "catalog_name": "earthranger_catalog",
    "catalog_properties": {
        "warehouse": "gs://dwh_dev/iceberg-warehouse/",
        "catalog-impl": "org.apache.iceberg.hadoop.HadoopCatalog",
    },
    "triggering_frequency_seconds": 60
}

There aren't any logs from the worker or worker-startup in the logs explorer, nor from harness or harness-startup, only the ones I shared from job-message. Unfortunately, we cannot file a Google Cloud support ticket since we don't have a GCP Support plan.

marianobrc avatar Oct 01 '25 19:10 marianobrc

Lemme try to run a job using a similar config.

BTW the logs you have to check are managed-transforms-worker and managed-transforms-worker-startup not the worker etc. you mentioned. These have to be explicitly selected in the Log Explorer UI with the correct time range. It might also help to unselect other logs. See the attached screenshot below.

Image

chamikaramj avatar Oct 01 '25 19:10 chamikaramj

Can you share the job id? @marianobrc

liferoad avatar Oct 01 '25 19:10 liferoad

Lemme try to run a job using a similar config.

BTW the logs you have to check are managed-transforms-worker and managed-transforms-worker-startup not the worker etc. you mentioned. These have to be explicitly selected in the Log Explorer UI with the correct time range. It might also help to unselect other logs. See the attached screenshot below. Image

Thank you for your quick response. For some reason, the managed- ones don't show up as an option for me:

Image

If I clear all the filters (All log names), then those job logs are all I can see:

Image

@liferoad here is the job id and other details in case are useful:

Job name
beamapp-dev-0929124506-405713-ih4cgl5j

Job ID
2025-09-29_05_47_09-6071847335088109426

Job type
Streaming

Job status
 Failed

SDK version
Apache Beam Python 3.12 SDK 2.67.0
 A newer version of the SDK family exists and updating is recommended. [Learn more ](https://cloud.google.com/dataflow/docs/support/sdk-version-support-status)

Job region
us-west1

Service zones
us-west1-b

Worker location 
us-west1

Thank you

marianobrc avatar Oct 01 '25 20:10 marianobrc

Thanks. We are looking into this.

As a workaround, you can specify the following pipeline option when you start the job which prevents managed transforms expansion.

--experiments=use_pipeline_version_for_managed_transforms="beam:schematransform:org.apache.beam:iceberg_write:v1"

chamikaramj avatar Oct 01 '25 21:10 chamikaramj

Thanks. We are looking into this.

As a workaround, you can specify the following pipeline option when you start the job which prevents managed transforms expansion.

--experiments=use_pipeline_version_for_managed_transforms="beam:schematransform:org.apache.beam:iceberg_write:v1"

@chamikaramj this workaround worked great, and our pipeline runs ok now. Would it be safe for us to use this workaround in prod or do you foresee any potential issues in doing that?

Thanks

marianobrc avatar Oct 02 '25 14:10 marianobrc

It should be OK to use it in prod but you will not be getting Dataflow managed I/O service features: https://cloud.google.com/dataflow/docs/guides/managed-io

I'll let you know here when you can try without the experiment.

chamikaramj avatar Oct 02 '25 14:10 chamikaramj

@chamikaramj quick question, does the iceberg connector (through the managed i/o service or not) support passing any configs to enable the upsert mode? We'd like data to be updated when processing rows with the same id (instead of inserting new rows).

Thanks

marianobrc avatar Oct 02 '25 19:10 marianobrc

@chamikaramj quick question, does the iceberg connector (through the managed i/o service or not) support passing any configs to enable the upsert mode? We'd like data to be updated when processing rows with the same id (instead of inserting new rows).

Thanks

I don't think this is currently supported but we can probably support this in the future.

cc: @ahmedabu98 @liferoad

chamikaramj avatar Oct 02 '25 22:10 chamikaramj

We did several updates related to this (and more improvements are coming). Can you let us know if you are still running into this issue.

chamikaramj avatar Dec 03 '25 18:12 chamikaramj

We did several updates related to this (and more improvements are coming). Can you let us know if you are still running into this issue.

Thank you @chamikaramj. We've switched to our own implementation of an Iceberg writer since the upsert mode support was key for our use case. Do you know if that is supported in the connector now?

marianobrc avatar Dec 04 '25 12:12 marianobrc

We did several updates related to this (and more improvements are coming). Can you let us know if you are still running into this issue.

Thank you @chamikaramj. We've switched to our own implementation of an Iceberg writer since the upsert mode support was key for our use case. Do you know if that is supported in the connector now?

Not yet, but I can mention here when the Iceberg upsert mode is available.

cc: @damccorm @ahmedabu98 @rezarokni

chamikaramj avatar Dec 04 '25 17:12 chamikaramj

@marianobrc Upserts aren't supported yet. We're working on a few things but it's on the roadmap (might get to it around mid-2026)

ahmedabu98 avatar Dec 06 '25 03:12 ahmedabu98