[Feature Request]: Specify output_type in ReadFromBigQuery Beam YAML transform
What would you like to happen?
It would be awesome to have the ability to specify output_type for ReadFromBigQuery Apache Beam YAML transform when using query.
Currently attempt to query the BigQuery table with this transform ends ups with "ValueError: Invalid transform specification at "Read from BigQuery" at line 3: Both a query and an output type of 'BEAM_ROW' were specified. 'BEAM_ROW' is not currently supported with queries." exception.
https://github.com/apache/beam/blob/c0a589534704cbdf8c43f0d56275332d99820cdf/sdks/python/apache_beam/io/gcp/bigquery.py#L2973-L2977
The workaround is to use combination of table, fields and row_restriction config parameters, but this does not allow for any aggregation, meaning that in some cases users must read a lot of data into memory instead of having BigQuery take care of it.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- [x] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [x] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
@svetakvsundhar is this feasible? I see you added the beam_row option.
I am guessing since the table doesnt exist for a given query, we cant derive beam schema from the tableschema. Can we add the ability for a user to pass a schema?