google-auth-library-python icon indicating copy to clipboard operation
google-auth-library-python copied to clipboard

Intermittent `RefreshError` with `Internal Server Error` on metadata service

Open lawrenceong opened this issue 2 years ago • 4 comments

Recently, we are getting intermittent RefreshError from python applications using google cloud services. The following is a stack trace:

class: <class 'google.auth.exceptions.RefreshError'> message: (\"Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[email protected]/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.read_only%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.read_write from the Google Compute Engine metadata service. Status: 500 Response:\\nb'Internal Server Error\\\\n'\", <google.auth.transport.requests._Response object at 0x7f4b8a2320b0>)

traceback:

...<snip>...
  File "/app/src/utils/cloud_storage.py", line 13, in __init__
    self.bucket: Bucket = self.client.get_bucket(self.bucket_name)
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/cloud/storage/client.py", line 772, in get_bucket
    bucket.reload(
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/cloud/storage/bucket.py", line 1086, in reload
    super(Bucket, self).reload(
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/cloud/storage/_helpers.py", line 246, in reload
    api_response = client._get_resource(
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/cloud/storage/client.py", line 377, in _get_resource
    return self._connection.api_request(
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/cloud/storage/_http.py", line 72, in api_request
    return call()
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/api_core/retry.py", line 349, in retry_wrapped_func
    return retry_target(
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/api_core/retry.py", line 191, in retry_target
    return target()
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/cloud/_http/__init__.py", line 482, in api_request
    response = self._make_request(
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/cloud/_http/__init__.py", line 341, in _make_request
    return self._do_request(
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/cloud/_http/__init__.py", line 379, in _do_request
    return self.http.request(
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/auth/transport/requests.py", line 545, in request
    self.credentials.before_request(auth_request, method, url, request_headers)
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/auth/credentials.py", line 135, in before_request
    self.refresh(request)
  File "/app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/auth/compute_engine/credentials.py", line 117, in refresh
    six.raise_from(new_exc, caught_exc)
  File "<string>", line 3, in raise_from
Frame before_request in /app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/auth/credentials.py at line 135
    self                 = <google.auth....x7f4b8a233e50>
    request              = functools.par...>, timeout=60)
    method               = 'GET'
    url                  = 'https://stor...tyPrint=false'
    headers              = {'Accept-Encoding': 'gzip', 'User-Agent': 'gcloud-pytho....0 gccl/2.8.0', 'X-Goog-API-Client': 'gcloud-pytho...-7aa99e2dd464'}
Frame refresh in /app/.cache/pypoetry/virtualenvs/appName/lib/python3.10/site-packages/google/auth/compute_engine/credentials.py at line 117
    self                 = <google.auth....x7f4b8a233e50>
    request              = functools.par...>, timeout=60)
    scopes               = ('https://www.....full_control', 'https://www....age.read_only', 'https://www....ge.read_write')
    new_exc              = RefreshError(...f4b8a2320b0>))
Frame raise_from in <string> at line 5
    value                = None
    from_value           = TransportErro...7f4b8a2320b0>)

The Internal Server Error seems to be happening only on python based instances. To workaround these errors, a tenacity retry was added on the function. Sample retry:

    @retry(
        retry=retry_if_exception_type(GoogleAuthError),
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=4, max=10),
    )

We are using node to access google cloud storage as well and do not get anything similar.

Is there any reason why we would get "intermittent" Internal Server Error when trying to refresh a service account's token?

Environment details

  • OS: GKE - containerd - v1.25.7-gke.1000 - with python:3.10-slim image - based on debian bullseye
  • Python version: 3.10.11
  • pip version: 23.1
  • google-auth version: 2.17.3

Steps to reproduce

Initialise the bucket via code and the issue will happen intermittently (around once every week). Code sample where issue is happening at startup of pod:

from google.cloud import storage

BUCKET_NAME = ".............."

client = storage.Client()
bucket = client.get_bucket(BUCKET_NAME)

lawrenceong avatar May 04 '23 11:05 lawrenceong

Hi @lawrenceong,

We are working on adding these retries into the client layer. Once that is complete these errors will be retried automatically.

Your workaround will work until then.

We are not planning to add any more retries to this codebase to create a single source of retries.

Thanks!

clundin25 avatar May 04 '23 21:05 clundin25

We are using pubsub as well and encountered a similar error. It does not seem to affect usability, so did not have to add a retry. However, we get the following in the logs which trigger alarms via error reporting:

Traceback (most recent call last):
  File "/app/.cache/pypoetry/virtualenvs/APP_NAME/lib/python3.10/site-packages/grpc/_plugin_wrapping.py", line 95, in __call__
    self._metadata_plugin(
  File "/app/.cache/pypoetry/virtualenvs/APP_NAME/lib/python3.10/site-packages/google/auth/transport/grpc.py", line 101, in __call__
    callback(self._get_authorization_headers(context), None)
  File "/app/.cache/pypoetry/virtualenvs/APP_NAME/lib/python3.10/site-packages/google/auth/transport/grpc.py", line 87, in _get_authorization_headers
    self._credentials.before_request(
  File "/app/.cache/pypoetry/virtualenvs/APP_NAME/lib/python3.10/site-packages/google/auth/credentials.py", line 135, in before_request
    self.refresh(request)
  File "/app/.cache/pypoetry/virtualenvs/APP_NAME/lib/python3.10/site-packages/google/auth/compute_engine/credentials.py", line 117, in refresh
    six.raise_from(new_exc, caught_exc)
  File "<string>", line 3, in raise_from
google.auth.exceptions.RefreshError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[email protected]/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform from the Google Compute Engine metadata service. Status: 500 Response:\nb'Internal Server Error\\n'", <google.auth.transport.requests._Response object at 0x7fa29a86ed40>)

lawrenceong avatar May 04 '23 23:05 lawrenceong

I also see many google.auth.exceptions.RefreshError exceptions and 500 Internal Server Errors in the logs. Is there any solution to fix these from happening?

mdzigurski avatar May 17 '23 20:05 mdzigurski