[Bug]: Iceberg managed connector upgrade failed in GCP Dataflow
What happened?
We are getting an error running our Apache Beam pipeline in Dataflow (v2 runners) using the managed transform for iceberg in python (cross-language).
The dataflow job fails to start with error:
Dataflow is unable to process the managed transform(s) ref_AppliedPTransform_WriteOutput-SchemaAwareExternalTransform-beam-transform-managed-v1_10(beam:schematransform:org.apache.beam:iceberg_write:v1) due to an internal error. Because the upgrade failed, the new job or the update request has been rejected. To troubleshoot, see https://cloud.google.com/dataflow/docs/guides/common-errors#managed-transforms-upgrade.
We are using Python 3.12, Apache beam SDK 2.67.0. The Iceberg connector we are using is: https://cloud.google.com/dataflow/docs/guides/managed-io-iceberg
We are using the Java version in Python through the managed write transforms like this:
apache_beam.managed.Write(beam.managed.ICEBERG, config=iceberg_config)
The pipeline runs fine without the connector, but fails early on the job setup stage if added. No pipeline graph or worker logs are generated at this stage.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- [x] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [x] Component: Google Cloud Dataflow Runner
@alxmrs @TheNeuralBit my colleague @marianobrc is hitting the above error trying to write to Iceberg using the x-language managed connector. This seems like it should be supported and we're a bit at a loss for next steps. Tagging you in case you happen to know, or know someone who knows, what may be happening here. 🙏
@ahmedabu98 @chamikaramj
Thanks for filing this.
Can you check if the the Dataflow Log Explorer logs managed-transforms-worker and managed-transforms-worker-startup mentioned in [1] has additional information regarding the error ?
Also, can you mention your Iceberg config ?
If you can file a Google Cloud support ticket, the support team take a look at your exact job.
[1] https://cloud.google.com/dataflow/docs/guides/common-errors#managed-transforms-upgrade
Thanks for filing this.
Can you check if the the Dataflow Log Explorer logs
managed-transforms-workerandmanaged-transforms-worker-startupmentioned in [1] has additional information regarding the error ?Also, can you mention your Iceberg config ?
If you can file a Google Cloud support ticket, the support team take a look at your exact job.
[1] https://cloud.google.com/dataflow/docs/guides/common-errors#managed-transforms-upgrade
Thanks for looking into this @chamikaramj The iceberg config is:
iceberg_config = {
"table": "earthranger_catalog.warehouse.observations",
"catalog_name": "earthranger_catalog",
"catalog_properties": {
"warehouse": "gs://dwh_dev/iceberg-warehouse/",
"catalog-impl": "org.apache.iceberg.hadoop.HadoopCatalog",
},
"triggering_frequency_seconds": 60
}
There aren't any logs from the worker or worker-startup in the logs explorer, nor from harness or harness-startup, only the ones I shared from job-message.
Unfortunately, we cannot file a Google Cloud support ticket since we don't have a GCP Support plan.
Lemme try to run a job using a similar config.
BTW the logs you have to check are managed-transforms-worker and managed-transforms-worker-startup not the worker etc. you mentioned. These have to be explicitly selected in the Log Explorer UI with the correct time range. It might also help to unselect other logs. See the attached screenshot below.
Can you share the job id? @marianobrc
Lemme try to run a job using a similar config.
BTW the logs you have to check are
managed-transforms-workerandmanaged-transforms-worker-startupnot theworkeretc. you mentioned. These have to be explicitly selected in the Log Explorer UI with the correct time range. It might also help to unselect other logs. See the attached screenshot below.
Thank you for your quick response. For some reason, the managed- ones don't show up as an option for me:
If I clear all the filters (All log names), then those job logs are all I can see:
@liferoad here is the job id and other details in case are useful:
Job name
beamapp-dev-0929124506-405713-ih4cgl5j
Job ID
2025-09-29_05_47_09-6071847335088109426
Job type
Streaming
Job status
Failed
SDK version
Apache Beam Python 3.12 SDK 2.67.0
A newer version of the SDK family exists and updating is recommended. [Learn more ](https://cloud.google.com/dataflow/docs/support/sdk-version-support-status)
Job region
us-west1
Service zones
us-west1-b
Worker location
us-west1
Thank you
Thanks. We are looking into this.
As a workaround, you can specify the following pipeline option when you start the job which prevents managed transforms expansion.
--experiments=use_pipeline_version_for_managed_transforms="beam:schematransform:org.apache.beam:iceberg_write:v1"
Thanks. We are looking into this.
As a workaround, you can specify the following pipeline option when you start the job which prevents managed transforms expansion.
--experiments=use_pipeline_version_for_managed_transforms="beam:schematransform:org.apache.beam:iceberg_write:v1"
@chamikaramj this workaround worked great, and our pipeline runs ok now. Would it be safe for us to use this workaround in prod or do you foresee any potential issues in doing that?
Thanks
It should be OK to use it in prod but you will not be getting Dataflow managed I/O service features: https://cloud.google.com/dataflow/docs/guides/managed-io
I'll let you know here when you can try without the experiment.
@chamikaramj quick question, does the iceberg connector (through the managed i/o service or not) support passing any configs to enable the upsert mode? We'd like data to be updated when processing rows with the same id (instead of inserting new rows).
Thanks
@chamikaramj quick question, does the iceberg connector (through the managed i/o service or not) support passing any configs to enable the upsert mode? We'd like data to be updated when processing rows with the same id (instead of inserting new rows).
Thanks
I don't think this is currently supported but we can probably support this in the future.
cc: @ahmedabu98 @liferoad
We did several updates related to this (and more improvements are coming). Can you let us know if you are still running into this issue.
We did several updates related to this (and more improvements are coming). Can you let us know if you are still running into this issue.
Thank you @chamikaramj. We've switched to our own implementation of an Iceberg writer since the upsert mode support was key for our use case. Do you know if that is supported in the connector now?
We did several updates related to this (and more improvements are coming). Can you let us know if you are still running into this issue.
Thank you @chamikaramj. We've switched to our own implementation of an Iceberg writer since the upsert mode support was key for our use case. Do you know if that is supported in the connector now?
Not yet, but I can mention here when the Iceberg upsert mode is available.
cc: @damccorm @ahmedabu98 @rezarokni
@marianobrc Upserts aren't supported yet. We're working on a few things but it's on the roadmap (might get to it around mid-2026)
