modelstore icon indicating copy to clipboard operation
modelstore copied to clipboard

Anonymous access of GCP bucket fails with `ValueError: Anonymous credentials cannot be refreshed.`

Open ionicsolutions opened this issue 2 years ago • 10 comments

Affects modelstore 0.0.74.

To reproduce:

# create a new environment (Python 3.8)
python -m venv env
source env/bin/activate

# install modelstore and GCP CLI
pip install modelstore google-cloud-storage


python
Python 3.8.8 (default, Apr  4 2021, 16:02:17) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelstore import ModelStore
>>> model_store = ModelStore.from_gcloud(bucket_name="xai-demo-models")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/model_store.py", line 90, in from_gcloud
    return ModelStore(
  File "<string>", line 4, in __init__
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/model_store.py", line 105, in __post_init__
    if not self.storage.validate():
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/storage/gcloud.py", line 128, in validate
    if not self.bucket.exists():
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/storage/bucket.py", line 843, in exists
    client._get_resource(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/storage/client.py", line 366, in _get_resource
    return self._connection.api_request(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/storage/_http.py", line 73, in api_request
    return call()
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
    return retry_target(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/api_core/retry.py", line 190, in retry_target
    return target()
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 482, in api_request
    response = self._make_request(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 341, in _make_request
    return self._do_request(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 379, in _do_request
    return self.http.request(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/auth/transport/requests.py", line 526, in request
    self.credentials.refresh(auth_request)
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/auth/credentials.py", line 173, in refresh
    raise ValueError("Anonymous credentials cannot be refreshed.")
ValueError: Anonymous credentials cannot be refreshed.

I remember encountering and resolving this issue while working on #142. We should have a look at the changes introduced by #161.

Output of pip freeze:

cachetools==5.0.0
certifi==2021.10.8
charset-normalizer==2.0.12
click==8.1.3
gitdb==4.0.9
GitPython==3.1.27
google-api-core==2.7.3
google-auth==2.6.6
google-cloud-core==2.3.0
google-cloud-storage==2.3.0
google-crc32c==1.3.0
google-resumable-media==2.3.2
googleapis-common-protos==1.56.0
idna==3.3
joblib==1.1.0
modelstore==0.0.74
numpy==1.22.3
protobuf==3.20.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.27.1
rsa==4.8
six==1.16.0
smmap==5.0.0
tqdm==4.64.0
urllib3==1.26.9

ionicsolutions avatar May 12 '22 11:05 ionicsolutions

Thanks for reporting! I'll try and investigate soon, but am away atm. If you spot anything in the 2nd PR you mentioned please let me know

nlathia avatar May 14 '22 22:05 nlathia

@ionicsolutions Can I use the "xai-demo-models" bucket for testing as well? I'm going to re-run your code above. Otherwise I'll create a testing-only public GCS container.

nlathia avatar May 16 '22 13:05 nlathia

@nlathia Sure, go ahead and use it for now! It contains one model in one domain.

ionicsolutions avatar May 16 '22 14:05 ionicsolutions

Just to log my investigation --

When trying to replicate this, the first error I ran into was because I have some environment variables set for GCP (which modelstore retrieves here) and this lead to a slightly different exception:

    raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/xai-demo-models?projection=noAcl&prettyPrint=false: <service-account-name> does not have storage.buckets.get access to the Google Cloud Storage bucket.

But when I removed those environment variables, I was able to replicate this:

    raise ValueError("Anonymous credentials cannot be refreshed.")
ValueError: Anonymous credentials cannot be refreshed.

Similar errors have been reported here:

  • https://github.com/mlflow/mlflow/issues/2925
  • https://github.com/googleapis/python-storage/issues/102

nlathia avatar May 17 '22 12:05 nlathia

I've managed to reproduce this error without modelstore. It is triggered when bucket.exists() is called, which is what we use in modelstore when validate()'ing that the GCP storage can be used.

Python 3.8.12 (default, Mar 24 2022, 23:17:02) 
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.cloud import storage
>>> bucket_name = "xai-demo-models"
>>> client = storage.Client.create_anonymous_client()
>>> bucket = client.bucket(bucket_name=bucket_name)
>>> bucket.exists()
[...]
  File "/Users/neallathia/.pyenv/versions/modelstore-dev-3-8-12/lib/python3.8/site-packages/google/auth/credentials.py", line 173, in refresh
    raise ValueError("Anonymous credentials cannot be refreshed.")
ValueError: Anonymous credentials cannot be refreshed.

nlathia avatar May 17 '22 13:05 nlathia

I believe the problem is that the bucket.exists() function is not enabled for anonymous clients. From the docs:

Such a client has only limited access to “public” buckets: listing their contents and downloading their blobs.

And I don't get any errors there:

>>> iterator = client.list_blobs(bucket_name)
>>> for i in iterator:
...     print(i.name)
... 
operatorai-model-store/domains/visual-inspection.json
operatorai-model-store/visual-inspection/2022/03/04/15:01:29/artifacts.tar.gz
operatorai-model-store/visual-inspection/versions/212ec479-f565-4440-aad2-c5f8d2b7d4f1.json

This is also the big difference between the first PR, where I suggested using exists() and the second PR, where I changed the validate function to use exists()

nlathia avatar May 17 '22 13:05 nlathia

Update: the exists() function does appear to work for bucket names that don't exist:

>>> bucket_name = "a-bucket-that-does-not-exist"
>>> client = storage.Client.create_anonymous_client()
>>> bucket = client.bucket(bucket_name=bucket_name)
>>> bucket.exists()
False

nlathia avatar May 17 '22 14:05 nlathia

Okay, I think that this PR has the fix (based on the above):

  • https://github.com/operatorai/modelstore/pull/176

Comments welcome & thanks for raising this again @ionicsolutions.

In short: I try exists(), if that fails with a ValueError, I try to list_blobs(); if that fails with NotFound then the validation fails.

nlathia avatar May 17 '22 16:05 nlathia

Just to confirm, this is how it looks for me now!

modelstore-dev-3-8-12 ❯ python
Python 3.8.12 (default, Mar 24 2022, 23:17:02) 
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelstore import ModelStore
>>> model_store = ModelStore.from_gcloud(bucket_name="xai-demo-models")
IPython could not be loaded!
pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
>>> model_store.list_domains()
['visual-inspection']
>>> model_store.list_models("visual-inspection")
['212ec479-f565-4440-aad2-c5f8d2b7d4f1']
>>> model_store.get_model_info("visual-inspection", "212ec479-f565-4440-aad2-c5f8d2b7d4f1")
{'model': {'domain': 'visual-inspection', 'model_id': '212ec479-f565-4440-aad2-c5f8d2b7d4f1', 'model_type': {'library': 'tensorflow', ...

nlathia avatar May 18 '22 12:05 nlathia

Thanks for solving this issue so quickly! I can confirm that it works with the latest main :-)

ionicsolutions avatar May 18 '22 13:05 ionicsolutions

✅ This was released as part of modelstore==0.0.75

  • https://github.com/operatorai/modelstore/pull/201
  • https://pypi.org/project/modelstore/0.0.75/

Let me know if you see any other issues!

nlathia avatar Sep 08 '22 14:09 nlathia