beam icon indicating copy to clipboard operation
beam copied to clipboard

[Bug]: Add SchemaFieldNumber annotations to Iceberg to prevent potential update compatibility issues

Open chamikaramj opened this issue 1 month ago • 3 comments

What happened?

Iceberg uses SchemaCoder in several locations.

https://github.com/apache/beam/blob/6d24c3dec3bfab3d49c32ed2ef2fe3d0a8d803ef/sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/SnapshotInfo.java#L153

https://github.com/apache/beam/blob/c236996a4550f92388f8688afae144fa402de171/sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadTask.java#L42

https://github.com/apache/beam/blob/c236996a4550f92388f8688afae144fa402de171/sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadTaskDescriptor.java#L37

Using SchemaCoder (as opposed to RowCoder, for example) can result in update compatibility issues for Dataflow Runner v2. These can be addressed by adding SchemaFieldNumber annotations similar to https://github.com/apache/beam/pull/36295.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • [ ] Component: Python SDK
  • [ ] Component: Java SDK
  • [ ] Component: Go SDK
  • [ ] Component: Typescript SDK
  • [ ] Component: IO connector
  • [ ] Component: Beam YAML
  • [ ] Component: Beam examples
  • [ ] Component: Beam playground
  • [ ] Component: Beam katas
  • [ ] Component: Website
  • [ ] Component: Infrastructure
  • [ ] Component: Spark Runner
  • [ ] Component: Flink Runner
  • [ ] Component: Samza Runner
  • [ ] Component: Twister2 Runner
  • [ ] Component: Hazelcast Jet Runner
  • [ ] Component: Google Cloud Dataflow Runner

chamikaramj avatar Nov 10 '25 20:11 chamikaramj

There are several more places we need to add 'SchemaFieldNumber's to fix update compatibility for IcebergIO. Opened https://github.com/apache/beam/pull/37055 for this.

cc: @shunping in case this can be cherry-picked to the ongoing release. (should be a relatively safe change)

chamikaramj avatar Dec 09 '25 07:12 chamikaramj

There are several more places we need to add 'SchemaFieldNumber's to fix update compatibility for IcebergIO. Opened #37055 for this.

cc: @shunping in case this can be cherry-picked to the ongoing release. (should be a relatively safe change)

RC2 is already built, we can add this to RC3

Amar3tto avatar Dec 09 '25 10:12 Amar3tto

Thanks! Opened https://github.com/apache/beam/pull/37065 for RC3.

chamikaramj avatar Dec 09 '25 16:12 chamikaramj