python-bigquery-pandas
python-bigquery-pandas copied to clipboard
Implement initial "waiting" logs with tqdm?
Currently the initial logs are every ~second. Could we instead implement this as a tqdm "progress bar", albeit without progress? That would be more elegant.
We could also have a hiearachical progress bar, with each of the two steps being a descendent of the parent. This would screen off the final log messages; since the total time would be left by tqdm.
INFO:pandas_gbq.gbq: Elapsed 6.71 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 7.88 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 9.05 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 10.23 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 11.42 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 12.6 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 13.6 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 14.61 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 15.78 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 16.95 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 18.11 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 19.28 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 20.44 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 21.61 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 22.76 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 23.91 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 25.1 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 26.25 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 27.41 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 28.6 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 29.8 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 30.99 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 32.01 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 33.2 s. Waiting...
INFO:pandas_gbq.gbq: Elapsed 34.36 s. Waiting...
Downloading: 100%|████████████████████████████████████████████| 2373289/2373289 [00:29<00:00, 79844.08rows/s]
INFO:pandas_gbq.gbq:Total time taken 66.53 s.
Finished at 2020-09-11 14:17:54.
We don't know ahead of time how long a query will take, so the current tqdm logic won't work.
Looks like there are a couple of open issues for indefinite progress bars at: https://github.com/tqdm/tqdm/issues/427 https://github.com/tqdm/tqdm/issues/925
A couple of options:
- Add "spinner" feature to tqdm and use that.
- Do some kind of exponential backoff on "waiting..." logging. 1/s at first, ramping up to 1/min?
- Use print statements with the right console codes to rewrite lines in-place to update the elapsed time instead of logging.
Good point @tswast
I just looked back on some code I wrote a few years ago and found this — coincidentally for waiting for jobs from Google Cloud! — though now I see the links above, maybe I should be more cautious about suggesting it's easy to have an indefinite progress bar...
def wait_for_job(job, timeout_in_seconds=None):
# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/bigquery/cloud-client/snippets.py
if timeout_in_seconds:
start = datetime.datetime.now()
timeout = start + datetime.timedelta(0, timeout_in_seconds)
with tqdm(
bar_format="Waiting for {desc} Elapsed: {elapsed}", total=10000
) as progress:
while True:
job.reload() # Refreshes the state via a GET request.
progress.set_description(str(job))
if job.state == "DONE":
if job.error_result:
raise RuntimeError(job.errors)
progress.bar_format = "Completed {desc}. Elapsed: {elapsed}"
return
if timeout_in_seconds:
if datetime.datetime.now() > timeout:
raise SystemError(f"Timed out after {timeout_in_seconds} seconds")
time.sleep(1)
I had some more thoughts about this. The UI uses the job statistics to show progression through the various stages. Filed https://github.com/googleapis/python-bigquery/issues/343 to see if we can implement this in google-cloud-bigquery, since it'll be relevant for the %%bigquery magics, too.
I just merged https://github.com/googleapis/python-bigquery/pull/352 which will be available in google-cloud-bigquery 2.4.0 https://github.com/googleapis/python-bigquery/pull/381