ci(bigquery): bigquery ci is very slow
This CI run took nearly an hour and a half: https://github.com/ibis-project/ibis/actions/runs/8697121662/job/23851753722.
Is there something we can do to speed this up a bit?
cc @tswast
Created a notebook in a gist showing the issue: https://gist.github.com/cpcloud/b019ed898312d422190152b02b029377.
Not sure why the increase in variability. One thing we might want to try is the new query_and_wait method (added in google-cloud-bigquery 3.14.0, https://github.com/googleapis/python-bigquery/pull/1722) which is optimized for queries that return small (< 100 MB) results.
optimized for queries that return small (< 100 MB) results.
Note: It falls back to the existing BQ Storage Read API implementation for larger results.
Copying from an email here for easier reference.
I added Client.query_and_wait in google-cloud-bigquery 3.14.0 late last year (https://github.com/googleapis/python-bigquery/blob/main/CHANGELOG.md#3140-2023-12-08). Since then there have been a few fixes and optimizations, but I think that should be safe as the minimum version for what ibis is doing. For small queries with small results (< 500 KB or so) that can save anywhere from a few hundred milliseconds to 3 seconds.
Looking at where .query() is currently, called in ibis
https://github.com/ibis-project/ibis/blob/560ddf6ca24e0d29fdd565aa59a22f3e7a32e959/ibis/backends/bigquery/init.py#L628-L631
it won't be quite trivial. For example, the pattern of waiting until .result() to set the page size
https://github.com/ibis-project/ibis/blob/560ddf6ca24e0d29fdd565aa59a22f3e7a32e959/ibis/backends/bigquery/init.py#L810
won't work. It needs to be set at query_and_wait time.
Yep, working through it now!
Thanks for the reference!