beam icon indicating copy to clipboard operation
beam copied to clipboard

[Bug]: [Python] Respect BigQuery insert byte size limit when writing batched rows

Open ahmedabu98 opened this issue 2 years ago • 3 comments

What happened?

Similar to #24979 but handle the with_batched_input case. Users can batch their BQ rows and write a PCollection of batches to BigQuery. These batches can exceed the byte size limits of BigQuery inserts [1]. Handle this case by splitting large batches into smaller ones and doing separate flushes.

[1] https://cloud.google.com/bigquery/quotas#streaming_inserts

Issue Priority

Priority: 3 (minor)

Issue Components

  • [X] Component: Python SDK
  • [ ] Component: Java SDK
  • [ ] Component: Go SDK
  • [ ] Component: Typescript SDK
  • [X] Component: IO connector
  • [ ] Component: Beam examples
  • [ ] Component: Beam playground
  • [ ] Component: Beam katas
  • [ ] Component: Website
  • [ ] Component: Spark Runner
  • [ ] Component: Flink Runner
  • [ ] Component: Samza Runner
  • [ ] Component: Twister2 Runner
  • [ ] Component: Hazelcast Jet Runner
  • [ ] Component: Google Cloud Dataflow Runner

ahmedabu98 avatar Jul 05 '23 18:07 ahmedabu98

+1, this is impacting our ability to turn on autosharding since the check only happens when autosharding is disabled.

dmills-spotify avatar Dec 07 '23 14:12 dmills-spotify

Upgrading to P2 as we now have a known failing use case

johnjcasey avatar Dec 08 '23 15:12 johnjcasey

Hi, I made a PR to solve this: https://github.com/apache/beam/pull/35212

quentin-sommer avatar Jun 11 '25 01:06 quentin-sommer