transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Passing a str Enum to `from_pretrained` gives OSError

Open rsmith49 opened this issue 1 year ago • 0 comments

System Info

Python version 3.8 transformers==4.28.1 Ubuntu

Who can help?

No response

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction

When using a str Enum (as specified here in the python docs) as input to AutoTokenizer.from_pretrained, the model name that gets searched is different from the member value of the Enum. Example to repro:

from enum import Enum
from transformers import AutoTokenizer

class Tmp(str, Enum):
    BERT = 'bert-base-uncased'

t = AutoTokenizer.from_pretrained(Tmp.BERT)

Error:

Traceback (most recent call last):
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
    response.raise_for_status()
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/Tmp.BERT/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/transformers/utils/hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 1195, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 1541, in get_hf_file_metadata
    hf_raise_for_status(r)
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-644c4a27-5bd929b32085d52d1a1b4b30)

Repository Not Found for url: https://huggingface.co/Tmp.BERT/resolve/main/tokenizer_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 642, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 486, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "/home/ubuntu/test_env/lib/python3.8/site-packages/transformers/utils/hub.py", line 424, in cached_file
    raise EnvironmentError(
OSError: Tmp.BERT is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

Expected behavior

We should see the model being searched for use the string value of the Enum member, instead of a different value (I haven't dug in to see what is being used instead).

rsmith49 avatar Apr 28 '23 22:04 rsmith49