beam icon indicating copy to clipboard operation
beam copied to clipboard

[Bug]: Dataflow Runner V2 does not support BigueryIO STORAGE_WRITE_API when configured withAutoSharding

Open damnMeddlingKid opened this issue 1 year ago • 0 comments

What happened?

We are attempting to use the STORAGE_WRITE_API with exactly-once guarantees in our pipelines running on Runner V2. Our configuration uses dynamic destinations and auto sharding, as detailed below:

BigQueryIO
          .write[TableRowWithTableId]
          .to(new DynamicDestinationImpl())
          .optimizedWrites()
          .withFormatFunction(_.tableRow)
          .withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
          .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
          .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
          .withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
          .withExtendedErrorInfo()
          .withAutoSharding()
          .withTriggeringFrequency(Duration.standardMinutes(15))

Issue Encountered

When we run our pipeline on runner V2 with the above BigQueryIO configuration we get the following error

Error translating pipeline. Runner V2 doesn't support the following SDK features: [Use STORAGE_WRITE_API].

The pipeline executes successfully when we modify the configuration to use a static number of write streams (withNumStorageWriteApiStreams(40)) instead of auto sharding.

While looking for references on this issue I found https://partnerissuetracker.corp.google.com/issues/271105510 which claims that auto sharding should work on Runner V2.

Questions

  1. Is it safe to use a static number of write streams as a work around to using the STORAGE_WRITE_API on runner V2
  2. What is the current state of STORAGE_WRITE_API support on runner V2 ?, im struggling to find an issue or documentation on this.
  3. Is it possible to support auto sharding for BigQueryIO.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • [ ] Component: Python SDK
  • [X] Component: Java SDK
  • [ ] Component: Go SDK
  • [ ] Component: Typescript SDK
  • [ ] Component: IO connector
  • [ ] Component: Beam YAML
  • [ ] Component: Beam examples
  • [ ] Component: Beam playground
  • [ ] Component: Beam katas
  • [ ] Component: Website
  • [ ] Component: Spark Runner
  • [ ] Component: Flink Runner
  • [ ] Component: Samza Runner
  • [ ] Component: Twister2 Runner
  • [ ] Component: Hazelcast Jet Runner
  • [ ] Component: Google Cloud Dataflow Runner

damnMeddlingKid avatar May 22 '24 15:05 damnMeddlingKid