google-cloud-python icon indicating copy to clipboard operation
google-cloud-python copied to clipboard

read_gbq: add mechanism to ensure BQ storage API usage

Open calbach opened this issue 10 months ago • 4 comments

Assumption: there is currently no supported way to force read_gbq to use the BQ storage API. I'd be happy to be corrected if I missed something!

Is your feature request related to a problem? Please describe.

I have cases where read_gbq's heuristic chooses the JSON API, when I want the storage API. This is most noticeable for me on medium sized tables, which might take 5-20 seconds to load via the JSON API (comparatively these were much faster via the storage API). For many of my use cases: to make interactive use cases more bearable, I am very willing to pay the additional storage API cost.

Describe the solution you'd like

A parameter to read_gbq which forces the usage of the BQ storage API (including raising an error if the necessary deps are not available to do so). I won't try to be prescriptive about the details, though I'll note that the desired behavior I've described is what I expected from use_bqstorage_api, based on the name. From my understanding of the current behavior, allow_bqstorage_api is maybe more accurate.

calbach avatar Feb 28 '25 22:02 calbach

I think you are correct. Basically, we are now calling query_and_wait, which might not create a destination table that we can read from with the BQ Storage API. Such a flag would have to force the use of query from google-cloud-bigquery.

tswast avatar Mar 19 '25 17:03 tswast

It sounds like we need three values for use_bqstorage_api:

  • True: always use (use query)
  • False: never use (currently works, I believe) (use query_and_wait and disable bq storage client creation)
  • "default" (or None): choose based on the heuristics in query_and_wait

tswast avatar Mar 19 '25 17:03 tswast