transformers
transformers copied to clipboard
Passing a str Enum to `from_pretrained` gives OSError
System Info
Python version 3.8
transformers==4.28.1
Ubuntu
Who can help?
No response
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
When using a str Enum (as specified here in the python docs) as input to AutoTokenizer.from_pretrained
, the model name that gets searched is different from the member value of the Enum. Example to repro:
from enum import Enum
from transformers import AutoTokenizer
class Tmp(str, Enum):
BERT = 'bert-base-uncased'
t = AutoTokenizer.from_pretrained(Tmp.BERT)
Error:
Traceback (most recent call last):
File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
response.raise_for_status()
File "/home/ubuntu/test_env/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/Tmp.BERT/resolve/main/tokenizer_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/test_env/lib/python3.8/site-packages/transformers/utils/hub.py", line 409, in cached_file
resolved_file = hf_hub_download(
File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 1195, in hf_hub_download
metadata = get_hf_file_metadata(
File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 1541, in get_hf_file_metadata
hf_raise_for_status(r)
File "/home/ubuntu/test_env/lib/python3.8/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-644c4a27-5bd929b32085d52d1a1b4b30)
Repository Not Found for url: https://huggingface.co/Tmp.BERT/resolve/main/tokenizer_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/test_env/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 642, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/home/ubuntu/test_env/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 486, in get_tokenizer_config
resolved_config_file = cached_file(
File "/home/ubuntu/test_env/lib/python3.8/site-packages/transformers/utils/hub.py", line 424, in cached_file
raise EnvironmentError(
OSError: Tmp.BERT is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
Expected behavior
We should see the model being searched for use the string value of the Enum member, instead of a different value (I haven't dug in to see what is being used instead).