python-bigquery-pandas icon indicating copy to clipboard operation
python-bigquery-pandas copied to clipboard

Implement initial "waiting" logs with tqdm?

Open max-sixty opened this issue 5 years ago • 4 comments
trafficstars

Currently the initial logs are every ~second. Could we instead implement this as a tqdm "progress bar", albeit without progress? That would be more elegant.

We could also have a hiearachical progress bar, with each of the two steps being a descendent of the parent. This would screen off the final log messages; since the total time would be left by tqdm.

INFO:pandas_gbq.gbq:  Elapsed 6.71 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 7.88 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 9.05 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 10.23 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 11.42 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 12.6 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 13.6 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 14.61 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 15.78 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 16.95 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 18.11 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 19.28 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 20.44 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 21.61 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 22.76 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 23.91 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 25.1 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 26.25 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 27.41 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 28.6 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 29.8 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 30.99 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 32.01 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 33.2 s. Waiting...
INFO:pandas_gbq.gbq:  Elapsed 34.36 s. Waiting...
Downloading: 100%|████████████████████████████████████████████| 2373289/2373289 [00:29<00:00, 79844.08rows/s]
INFO:pandas_gbq.gbq:Total time taken 66.53 s.
Finished at 2020-09-11 14:17:54.

max-sixty avatar Sep 11 '20 21:09 max-sixty

We don't know ahead of time how long a query will take, so the current tqdm logic won't work.

Looks like there are a couple of open issues for indefinite progress bars at: https://github.com/tqdm/tqdm/issues/427 https://github.com/tqdm/tqdm/issues/925

A couple of options:

  • Add "spinner" feature to tqdm and use that.
  • Do some kind of exponential backoff on "waiting..." logging. 1/s at first, ramping up to 1/min?
  • Use print statements with the right console codes to rewrite lines in-place to update the elapsed time instead of logging.

tswast avatar Sep 11 '20 22:09 tswast

Good point @tswast

I just looked back on some code I wrote a few years ago and found this — coincidentally for waiting for jobs from Google Cloud! — though now I see the links above, maybe I should be more cautious about suggesting it's easy to have an indefinite progress bar...


def wait_for_job(job, timeout_in_seconds=None):
    # https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/bigquery/cloud-client/snippets.py
    if timeout_in_seconds:
        start = datetime.datetime.now()
        timeout = start + datetime.timedelta(0, timeout_in_seconds)

    with tqdm(
        bar_format="Waiting for {desc} Elapsed: {elapsed}", total=10000
    ) as progress:
        while True:
            job.reload()  # Refreshes the state via a GET request.
            progress.set_description(str(job))
            if job.state == "DONE":
                if job.error_result:
                    raise RuntimeError(job.errors)
                progress.bar_format = "Completed {desc}. Elapsed: {elapsed}"
                return
            if timeout_in_seconds:
                if datetime.datetime.now() > timeout:
                    raise SystemError(f"Timed out after {timeout_in_seconds} seconds")
            time.sleep(1)

max-sixty avatar Sep 12 '20 01:09 max-sixty

I had some more thoughts about this. The UI uses the job statistics to show progression through the various stages. Filed https://github.com/googleapis/python-bigquery/issues/343 to see if we can implement this in google-cloud-bigquery, since it'll be relevant for the %%bigquery magics, too.

tswast avatar Oct 26 '20 14:10 tswast

I just merged https://github.com/googleapis/python-bigquery/pull/352 which will be available in google-cloud-bigquery 2.4.0 https://github.com/googleapis/python-bigquery/pull/381

tswast avatar Nov 16 '20 16:11 tswast