transformers
transformers copied to clipboard
Using `auto_map` in `tokenizer_config.json` gives `TypeError: argument of type 'NoneType' is not iterable`
System Info
certifi==2022.12.7 charset-normalizer==3.1.0 cmake==3.26.3 filelock==3.12.0 fsspec==2023.4.0 huggingface-hub==0.14.0 idna==3.4 Jinja2==3.1.2 lit==16.0.2 MarkupSafe==2.1.2 mpmath==1.3.0 networkx==3.1 numpy==1.24.3 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 packaging==23.1 PyYAML==6.0 regex==2023.3.23 requests==2.28.2 sentencepiece==0.1.98 sympy==1.11.1 tokenizers==0.13.3 torch==2.0.0 tqdm==4.65.0 -e git+https://github.com/huggingface/transformers.git@073baf7f2289dbbf99e29f375e40c3e270ba6e85#egg=transformers triton==2.0.0 typing-extensions==4.5.0 urllib3==1.26.15
Who can help?
@ArthurZucker
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
Running the following...
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-10b-chinese", trust_remote_code=True)
Gave the error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jovyan/transformers/src/transformers/models/auto/tokenization_auto.py", line 692, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/jovyan/transformers/src/transformers/tokenization_utils_base.py", line 1812, in from_pretrained
return cls._from_pretrained(
File "/home/jovyan/transformers/src/transformers/tokenization_utils_base.py", line 1878, in _from_pretrained
init_kwargs["auto_map"] = add_model_info_to_auto_map(
File "/home/jovyan/transformers/src/transformers/utils/generic.py", line 563, in add_model_info_to_auto_map
auto_map[key] = [f"{repo_id}--{v}" if "--" not in v else v for v in value]
File "/home/jovyan/transformers/src/transformers/utils/generic.py", line 563, in <listcomp>
auto_map[key] = [f"{repo_id}--{v}" if "--" not in v else v for v in value]
TypeError: argument of type 'NoneType' is not iterable
Expected behavior
Load tokenizer without errors.
Analysis
- I suspect it has to do with
auto_mapintokenizer_config.jsonhere - The tokenizer loads fine with transformers version 4.27.0
cc @sgugger seems like #22814 added
if "auto_map" in init_kwargs and not _is_local:
# For backward compatibility with odl format.
if isinstance(init_kwargs["auto_map"], (tuple, list)):
init_kwargs["auto_map"] = {"AutoTokenizer": init_kwargs["auto_map"]}
init_kwargs["auto_map"] = add_model_info_to_auto_map(
init_kwargs["auto_map"], pretrained_model_name_or_path
)
I can take this on but you are more familiar with the changes
Thanks for flagging! The PR linked above should fix this.