bionic icon indicating copy to clipboard operation
bionic copied to clipboard

Bionic should retry GCS uploads/downloads if they time out

Open jqmp opened this issue 5 years ago • 1 comments
trafficstars

Sometimes GCS file uploads (and presumably downloads) can time out (stack trace attached below). For most of these operations we use the GCS Python API rather than gsutil, so it's probably not retrying by default. We should probably add some retry logic to reduce the chance of transient failures crashing the whole process.

  File "/usr/local/lib/python3.7/site-packages/bionic/cache.py", line 284, in _blob_from_file
    self._cloud.upload(file_path, blob_url)
  File "/usr/local/lib/python3.7/site-packages/bionic/cache.py", line 601, in upload
    self._tool.blob_from_url(url).upload_from_filename(str(path))
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1320, in upload_from_filename
    predefined_acl=predefined_acl,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1265, in upload_from_file
    client, file_obj, content_type, size, num_retries, predefined_acl
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1175, in _do_upload
    client, stream, content_type, size, num_retries, predefined_acl
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1122, in _do_resumable_upload
    response = upload.transmit_next_chunk(transport)
  File "/usr/local/lib/python3.7/site-packages/google/resumable_media/requests/upload.py", line 425, in transmit_next_chunk
    retry_strategy=self._retry_strategy,
  File "/usr/local/lib/python3.7/site-packages/google/resumable_media/requests/_helpers.py", line 136, in http_request
    return _helpers.wait_and_retry(func, RequestsMixin._get_status_code, retry_strategy)
  File "/usr/local/lib/python3.7/site-packages/google/resumable_media/_helpers.py", line 150, in wait_and_retry
    response = func()
  File "/usr/local/lib/python3.7/site-packages/google/auth/transport/requests.py", line 287, in request
    **kwargs
  File "/usr/local/lib/python3.7/site-packages/google/auth/transport/requests.py", line 110, in __exit__
    raise self._timeout_error_type()
requests.exceptions.Timeout

jqmp avatar Jan 24 '20 20:01 jqmp

Just noting that we've also seen a google.resumable_media.common.DataCorruption error in the wild; however, I don't know if this is something that would be fixed with a retry.

jqmp avatar Jun 11 '20 20:06 jqmp