python-bigquery-pandas icon indicating copy to clipboard operation
python-bigquery-pandas copied to clipboard

Option to use Streaming API

Open harir91 opened this issue 5 years ago • 3 comments

I believe that the core BQ Client now allows Streaming from a DataFrame.

I was wondering if we could make that an option in pandas-gbq to allow circumventing some of the limitations of the 1000 loads etc.

harir91 avatar Dec 05 '19 22:12 harir91

Yes, this is much easier to implement now that the BigQuery client has an insert_rows_from_dataframe method.

Perhaps, a use_streaming_api parameter that defaults to False? Or if we want to support other upload mechanisms in the future (I anticipate the BQ Storage API adding a mechanism at some point), maybe upload_type='load' or upload_type='streaming' option.

tswast avatar Dec 06 '19 01:12 tswast

Correct, I think an upload_type or upload_mechanism is a more scaleable long-term approach to this.

harir91 avatar Dec 06 '19 06:12 harir91

One thing to note is that there is a distinction in the way pandas-gbq handles Records/various structures and the way the Python BQ client does (insert_rows_from_dataframe) vs (to_gbq) in terms of what data structures are supported so should ensure that the two df's using to_gbq yield the same outcome.

harir91 avatar Dec 06 '19 06:12 harir91