probe-scraper icon indicating copy to clipboard operation
probe-scraper copied to clipboard

Retry transient GCS errors

Open relud opened this issue 2 years ago • 3 comments

https://github.com/mozilla/bedrock/actions/runs/4598182706/jobs/8121878736 https://github.com/mozilla/glean/actions/runs/4609412250/jobs/8146505104?pr=2441

gsutil is failing to download objects that fail with 404 exceptions:

Error: Command ['gsutil', '-q', '-m', 'rsync', '-r', 'gs://probe-scraper-prod-artifacts/glean/', '/tmp/tmpy4y4leon/output/glean'] returned non-zero exit status 1:
NotFoundException: 404 gs://probe-scraper-prod-artifacts/glean/reference-browser/general does not exist.
NotFoundException: 404 gs://probe-scraper-prod-artifacts/glean/reference-browser/pings does not exist.
NotFoundException: 404 gs://probe-scraper-prod-artifacts/glean/reference-browser/tags does not exist.
CommandException: 3 files/objects could not be copied/removed.

the error is transient, because the objects do exist, ~but presumably are temporarily disappearing during upload or something like that.~ edit: but they have been updated since gsutil listed them, and gsutil requests the specific version at time of listing.

we could retry the full gsutil sync on failure, or we could reimplement the gsutil sync in python and retry 404s. the latter option is probably more robust, and should be relatively short.

relud avatar Apr 04 '23 16:04 relud

@relud should we consider adding in some logging to help understanding the issue first, e.g. what's in GoogleCloudPlatform/gsutil#906 ?

Dexterp37 avatar Apr 06 '23 10:04 Dexterp37

we could add the -DD flag:

OPTIONS
  -D          Shows HTTP requests/headers and additional debug info needed
              when posting support requests, including exception stack traces.

              CAUTION: The output from using this flag includes authentication
              credentials. Before including this flag in your command, be sure
              you understand how the command's output is used, and, if
              necessary, remove or redact sensitive information.

  -DD         Same as -D, plus HTTP upstream payload.

but I wouldn't recommend it, as those headers will include auth tokens.

relud avatar Apr 11 '23 16:04 relud

That said, I can confirm from running the command locally with -DD that gsutil does request a specific "generation" of objects, so if the file was rewritten between listing the object and downloading the content, I would expect it to 404.

relud avatar Apr 11 '23 16:04 relud