[Feature Request]: Specify output_type in ReadFromBigQuery Beam YAML transform

Open zycietocalareszta opened this issue 3 weeks ago • 1 comments

What would you like to happen?

It would be awesome to have the ability to specify output_type for ReadFromBigQuery Apache Beam YAML transform when using query. Currently attempt to query the BigQuery table with this transform ends ups with "ValueError: Invalid transform specification at "Read from BigQuery" at line 3: Both a query and an output type of 'BEAM_ROW' were specified. 'BEAM_ROW' is not currently supported with queries." exception. https://github.com/apache/beam/blob/c0a589534704cbdf8c43f0d56275332d99820cdf/sdks/python/apache_beam/io/gcp/bigquery.py#L2973-L2977

The workaround is to use combination of table, fields and row_restriction config parameters, but this does not allow for any aggregation, meaning that in some cases users must read a lot of data into memory instead of having BigQuery take care of it.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

[x] Component: Python SDK
[ ] Component: Java SDK
[ ] Component: Go SDK
[ ] Component: Typescript SDK
[ ] Component: IO connector
[x] Component: Beam YAML
[ ] Component: Beam examples
[ ] Component: Beam playground
[ ] Component: Beam katas
[ ] Component: Website
[ ] Component: Infrastructure
[ ] Component: Spark Runner
[ ] Component: Flink Runner
[ ] Component: Samza Runner
[ ] Component: Twister2 Runner
[ ] Component: Hazelcast Jet Runner
[ ] Component: Google Cloud Dataflow Runner

Dec 04 '25 10:12 zycietocalareszta

@svetakvsundhar is this feasible? I see you added the beam_row option.

I am guessing since the table doesnt exist for a given query, we cant derive beam schema from the tableschema. Can we add the ability for a user to pass a schema?

Dec 08 '25 17:12 claudevdm