[Bug]: [Python] Respect BigQuery insert byte size limit when writing batched rows
What happened?
Similar to #24979 but handle the with_batched_input case.
Users can batch their BQ rows and write a PCollection of batches to BigQuery. These batches can exceed the byte size limits of BigQuery inserts [1]. Handle this case by splitting large batches into smaller ones and doing separate flushes.
[1] https://cloud.google.com/bigquery/quotas#streaming_inserts
Issue Priority
Priority: 3 (minor)
Issue Components
- [X] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [X] Component: IO connector
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
+1, this is impacting our ability to turn on autosharding since the check only happens when autosharding is disabled.
Upgrading to P2 as we now have a known failing use case
Hi, I made a PR to solve this: https://github.com/apache/beam/pull/35212