python-bigquery-pandas
python-bigquery-pandas copied to clipboard
Option to use Streaming API
I believe that the core BQ Client now allows Streaming from a DataFrame.
I was wondering if we could make that an option in pandas-gbq to allow circumventing some of the limitations of the 1000 loads etc.
Yes, this is much easier to implement now that the BigQuery client has an insert_rows_from_dataframe method.
Perhaps, a use_streaming_api parameter that defaults to False? Or if we want to support other upload mechanisms in the future (I anticipate the BQ Storage API adding a mechanism at some point), maybe upload_type='load' or upload_type='streaming' option.
Correct, I think an upload_type or upload_mechanism is a more scaleable long-term approach to this.
One thing to note is that there is a distinction in the way pandas-gbq handles Records/various structures and the way the Python BQ client does (insert_rows_from_dataframe) vs (to_gbq) in terms of what data structures are supported so should ensure that the two df's using to_gbq yield the same outcome.