php-docs-samples icon indicating copy to clipboard operation
php-docs-samples copied to clipboard

BigQuery - Import From File, documentation doesn't match code

Open williamjulianvicary opened this issue 7 years ago • 5 comments

The documentation for import_from_file executes runJob and then adds an exponential backoff and polls the job until it is complete.

Looking at the runJob method on the Table this already has that logic via a trait which is called by waitUntilComplete().

I assume this code is not needed:

    $backoff = new ExponentialBackoff(10);
    $backoff->execute(function () use ($job) {
        printf('Waiting for job to complete' . PHP_EOL);
        $job->reload();
        if (!$job->isComplete()) {
            throw new Exception('Job has not yet completed', 500);
        }
    });

File reference: https://github.com/GoogleCloudPlatform/php-docs-samples/blob/eb3ba494847431b3e52553815acd4ebf2b4d69d6/bigquery/api/src/functions/import_from_file.php#L59

williamjulianvicary avatar Apr 06 '18 10:04 williamjulianvicary

The waitUntilComplete function can be used as well, but it does not have an exponential backoff, and so hammers the API a lot more than when using the ExponentialBackoff class, which is recommended.

bshaffer avatar Jun 13 '19 22:06 bshaffer

@bshaffer I do see that, however on this line, within the runJob() function on the Table, there is already the waitUntilComplete being used: https://github.com/googleapis/google-cloud-php-bigquery/blob/master/src/Table.php#L283

Which references the following on the Job class: https://github.com/googleapis/google-cloud-php-bigquery/blob/master/src/Job.php#L224

This eventually gets an exponential backoff here: https://github.com/googleapis/google-cloud-php-bigquery/blob/master/src/JobWaitTrait.php#L58

So, correct me if I'm wrong, I believe the runJob() on the table is already waiting until complete, with an exponential backoff baked into the code.

Even if the ExpotentialBackoff wasn't in place via the JobWaitTrait it would be impossible to circumvent this as the runJob already invokes a retry loop.

Let me know if you need any further details!

williamjulianvicary avatar Jul 16 '19 08:07 williamjulianvicary

Thank you for your thorough report of the issue! Now that you say it, I believe this was put into place because there is an eventual consistency issue between a job being reported as completed and the status of the job updating as completed, and this for-loop is meant to accommodate this (although the comment says otherwise)

bshaffer avatar Jul 16 '19 18:07 bshaffer

@jdpedrie Could you take a look and see if (a) the code sample makes sense and still works and (b) if we should update the comment as @bshaffer notes?

meredithslota avatar Feb 05 '21 18:02 meredithslota

Thanks for raising this @williamjulianvicary . I think we need to update the relevant BigQuery samples and remove the exponential backoff.

I am leaving this issue as open for now, will adjust into team backlog. ~13 samples could be affected (upon searching ExponentialBackoff in ./bigquery).

bigquery/api/src/copy_table.php
bigquery/api/src/import_from_local_csv.php
bigquery/api/src/import_from_storage_csv_autodetect.php
bigquery/api/src/import_from_storage_csv_truncate.php
bigquery/api/src/import_from_storage_csv.php
bigquery/api/src/import_from_storage_json_autodetect.php
bigquery/api/src/import_from_storage_json_truncate.php
bigquery/api/src/import_from_storage_json.php
bigquery/api/src/import_from_storage_orc_truncate.php
bigquery/api/src/import_from_storage_orc.php
bigquery/api/src/import_from_storage_parquet_truncate.php
bigquery/api/src/import_from_storage_parquet.php
bigquery/api/src/run_query_as_job.php

vishwarajanand avatar Sep 29 '22 16:09 vishwarajanand