google-cloud-python add `configuration` argument to `to

The to_gbq function should take a configuration argument representing a BigQuery JobConfiguration REST API resource.

This would make it consistent with the read_gbq function.

Context

Options for table creation / schema updates

Partitioning and Clustering: https://github.com/googleapis/python-bigquery-pandas/issues/395
Schema update options: https://github.com/googleapis/python-bigquery-pandas/issues/107
Partition expiration time https://github.com/googleapis/google-cloud-python/issues/14493

I believe these would require table creation to be done by load job instead of a separate create table step (especially partitioning, as that must be done at creation time). TBD what this would look like if we add support for the BigQuery Storage Write API or (legacy) Streaming API.

Options for file loading

Custom NULL marker https://github.com/googleapis/python-bigquery-pandas/issues/366 -- this would require an update to pandas CSV write configuration as, I believe, though.

Nov 17 '21 16:11 tswast

Hello @tswast @chalmerlowe,

would you accept external contributions for this feature? My usage mainly lie on the clustering / partitioning, but happy to help with this PR.

Currently, I use this function to create a table

def create_partitioned_table(client, project_id, dataset_name, table_name, schema):
    # Set table_id to the ID of the table to create.
    table_id = "{}.{}.{}".format(project_id, dataset_name, table_name)
    
    # Construct a full Table object to send to the API with partitioning.
    table = bigquery.Table(table_id, schema=schema)
    
    # Set partitioning on the 'date' field.
    table.time_partitioning = bigquery.TimePartitioning(
        type_=bigquery.TimePartitioningType.DAY,  # Partition by day
        field="date"  # Specify the field to partition on
    )
    
    # Send the table to the API for creation.
    table = client.create_table(table)  # Make an API request.
    
    print(
        "Created partitioned table {}.{}.{} on column 'date'".format(
            table.project, table.dataset_id, table.table_id
        )
    )

And then load data using the pandas_gbq library. That would be nice to have everything built-in.

Let me know if you'd be open to it and I can work on the PR in October.

Thank you !

Aug 30 '25 21:08 antoineeripret

Would love a contribution. For clustering columns in particular, let's mimic the interface that bigframes.pandas.DataFrame.to_gbq came up with:

clustering_columns: typing.Union[
        pandas.core.indexes.base.Index, typing.Iterable[typing.Hashable]
    ] = (),

clustering_columns Union[pd.Index, Iterable[Hashable]], default ()

Specifies the columns for clustering in the BigQuery table. The order of columns in this list is significant for clustering hierarchy. Index columns may be included in clustering if the index parameter is set to True, and their names are specified in this. These index columns, if included, precede DataFrame columns in the clustering order. The clustering order within the Index/DataFrame columns follows the order specified in clustering_columns.

https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.dataframe.DataFrame#bigframes_dataframe_DataFrame_to_gbq

Aug 31 '25 12:08 tswast

@tswast, I'll have a look in a couple of weeks then. Thanks !

Aug 31 '25 13:08 antoineeripret

@tswast: I've worked on a draft PR here.

This would be my first contribution to one of Google's repo, hence quite new to how you operate. Any input is more than welcome to ensure that it meets your criteria.

Thanks !

Sep 22 '25 11:09 antoineeripret

add `configuration` argument to `to_gbq`