python-bigquery-pandas
python-bigquery-pandas copied to clipboard
`read_gbq(table_id, columns=[list of columns])` should actually filter the amount of columns downloaded from the API
Is your feature request related to a problem? Please describe.
Currently, one only uses the columns parameter to re-order the list of columns and it has to exactly match the columns provided in the query or table. See this TODO:
https://github.com/googleapis/python-bigquery-pandas/blob/912b615b6d8d0ff11451c247fb65e9a293b06490/pandas_gbq/gbq.py#L939-L944
Describe the solution you'd like
Only download the selected columns if the user passes a list of columns to read_gbq
For queries:
Maybe these still need to have the columns match since one can specify these in SQL? I don't see a selected_fields option in https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.client.Client#google_cloud_bigquery_client_Client_query_and_wait
For table IDs:
Pass the list of columns through as selected_fields to https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.client.Client#google_cloud_bigquery_client_Client_list_rows
Starting here: https://github.com/googleapis/python-bigquery-pandas/blob/912b615b6d8d0ff11451c247fb65e9a293b06490/pandas_gbq/gbq.py#L914-L919 going through to https://github.com/googleapis/python-bigquery-pandas/blob/912b615b6d8d0ff11451c247fb65e9a293b06490/pandas_gbq/gbq.py#L396
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Aside: https://googleapis.dev/python/pandas-gbq/latest/reading.html has no mention that a table ID is supported. We should add a sample there.