beam
beam copied to clipboard
[Bug]: Dataflow Runner V2 does not support BigueryIO STORAGE_WRITE_API when configured withAutoSharding
What happened?
We are attempting to use the STORAGE_WRITE_API with exactly-once guarantees in our pipelines running on Runner V2. Our configuration uses dynamic destinations and auto sharding, as detailed below:
BigQueryIO
.write[TableRowWithTableId]
.to(new DynamicDestinationImpl())
.optimizedWrites()
.withFormatFunction(_.tableRow)
.withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
.withExtendedErrorInfo()
.withAutoSharding()
.withTriggeringFrequency(Duration.standardMinutes(15))
Issue Encountered
When we run our pipeline on runner V2 with the above BigQueryIO configuration we get the following error
Error translating pipeline. Runner V2 doesn't support the following SDK features: [Use STORAGE_WRITE_API].
The pipeline executes successfully when we modify the configuration to use a static number of write streams (withNumStorageWriteApiStreams(40)) instead of auto sharding.
While looking for references on this issue I found https://partnerissuetracker.corp.google.com/issues/271105510 which claims that auto sharding should work on Runner V2.
Questions
- Is it safe to use a static number of write streams as a work around to using the
STORAGE_WRITE_APIon runner V2 - What is the current state of
STORAGE_WRITE_APIsupport on runner V2 ?, im struggling to find an issue or documentation on this. - Is it possible to support auto sharding for BigQueryIO.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- [ ] Component: Python SDK
- [X] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner