php-docs-samples
php-docs-samples copied to clipboard
BigQuery - Import From File, documentation doesn't match code
The documentation for import_from_file executes runJob and then adds an exponential backoff and polls the job until it is complete.
Looking at the runJob method on the Table this already has that logic via a trait which is called by waitUntilComplete().
I assume this code is not needed:
$backoff = new ExponentialBackoff(10);
$backoff->execute(function () use ($job) {
printf('Waiting for job to complete' . PHP_EOL);
$job->reload();
if (!$job->isComplete()) {
throw new Exception('Job has not yet completed', 500);
}
});
File reference: https://github.com/GoogleCloudPlatform/php-docs-samples/blob/eb3ba494847431b3e52553815acd4ebf2b4d69d6/bigquery/api/src/functions/import_from_file.php#L59
The waitUntilComplete function can be used as well, but it does not have an exponential backoff, and so hammers the API a lot more than when using the ExponentialBackoff class, which is recommended.
@bshaffer I do see that, however on this line, within the runJob() function on the Table, there is already the waitUntilComplete being used:
https://github.com/googleapis/google-cloud-php-bigquery/blob/master/src/Table.php#L283
Which references the following on the Job class: https://github.com/googleapis/google-cloud-php-bigquery/blob/master/src/Job.php#L224
This eventually gets an exponential backoff here: https://github.com/googleapis/google-cloud-php-bigquery/blob/master/src/JobWaitTrait.php#L58
So, correct me if I'm wrong, I believe the runJob() on the table is already waiting until complete, with an exponential backoff baked into the code.
Even if the ExpotentialBackoff wasn't in place via the JobWaitTrait it would be impossible to circumvent this as the runJob already invokes a retry loop.
Let me know if you need any further details!
Thank you for your thorough report of the issue! Now that you say it, I believe this was put into place because there is an eventual consistency issue between a job being reported as completed and the status of the job updating as completed, and this for-loop is meant to accommodate this (although the comment says otherwise)
@jdpedrie Could you take a look and see if (a) the code sample makes sense and still works and (b) if we should update the comment as @bshaffer notes?
Thanks for raising this @williamjulianvicary . I think we need to update the relevant BigQuery samples and remove the exponential backoff.
I am leaving this issue as open for now, will adjust into team backlog. ~13 samples could be affected (upon searching ExponentialBackoff in ./bigquery).
bigquery/api/src/copy_table.php
bigquery/api/src/import_from_local_csv.php
bigquery/api/src/import_from_storage_csv_autodetect.php
bigquery/api/src/import_from_storage_csv_truncate.php
bigquery/api/src/import_from_storage_csv.php
bigquery/api/src/import_from_storage_json_autodetect.php
bigquery/api/src/import_from_storage_json_truncate.php
bigquery/api/src/import_from_storage_json.php
bigquery/api/src/import_from_storage_orc_truncate.php
bigquery/api/src/import_from_storage_orc.php
bigquery/api/src/import_from_storage_parquet_truncate.php
bigquery/api/src/import_from_storage_parquet.php
bigquery/api/src/run_query_as_job.php