transformers
transformers copied to clipboard
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
System Info
4.27.1
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
i test llama in colab here is my code and output:
!pip install git+https://github.com/huggingface/transformers !pip install sentencepiece
import torch from transformers import pipeline,LlamaTokenizer,LlamaForCausalLM
device = "cuda:0" if torch.cuda.is_available() else "cpu" print(device)
tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
model = LlamaForCausalLM.from_pretrained("decapoda-research/llama-7b-hf")
generator = pipeline(model="decapoda-research/llama-7b-hf", device=device) generator("I can't believe you did such a ")
ValueError Traceback (most recent call last)
1 frames /usr/local/lib/python3.9/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs) 675 676 if tokenizer_class is None: --> 677 raise ValueError( 678 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported." 679 )
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
Expected behavior
expect output generated info
I face the same issue
Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.
This is likely due to the configuration files being created before the final PR was merged in.
I cloned the repo and changed the tokenizer in the config file to LlamaTokenizer but I got ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.
For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it's probably better to try find or save a new model with the new naming.
@yhifny Are you able to import the tokenizer directly using from transformers import LlamaTokenizer ?
If not, can you make sure that you are working from the development branch in your environment using:
pip install git+https://github.com/huggingface/transformers
more details here.
I can import the LlamaTokenizer class, but getting error that from_pretrained method is None. Anyone else having this issue?
As the error message probably mentions, you need to install sentencepiece: pip install sentencepiece.
Working now. I swear I had sentencepiece, but probably forgot to reset the runtime 🤦 My bad!
For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it's probably better to try find or save a new model with the new naming.
Thanks, man, your link solved all the problem
Hi @candowu, thanks for raising this issue. This is arising, because the
tokenizerin the config on the hub points toLLaMATokenizer. However, the tokenizer in the library isLlamaTokenizer.This is likely due to the configuration files being created before the final PR was merged in.
Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.
For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it's probably better to try find or save a new model with the new naming.
Thank you so much for this! Works!
Hi @candowu, thanks for raising this issue. This is arising, because the
tokenizerin the config on the hub points toLLaMATokenizer. However, the tokenizer in the library isLlamaTokenizer. This is likely due to the configuration files being created before the final PR was merged in.Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.
I assume this is applied to the llama-7b cloned repo from HuggingFace right? How can I instantiate the model and the tokenizer after doing that please?
you are a life saver. There docs on the site should be updated for this reference.
Thank you so much for this! Works! That's amazing!
You can try this for a ather crazy way to find out what is the right casing for the module:
import transformers
from itertools import product
import importlib
def find_variable_case(s, max_tries=1000):
var_permutations = list(map("".join, product(*zip(s.upper(), s.lower()))))
# Intuitively, any camel casing should minimize the no. of upper chars.
# From https://stackoverflow.com/a/58789587/610569
var_permutations.sort(key=lambda ss: (sum(map(str.isupper, ss)), len(ss)))
for i, v in enumerate(var_permutations):
if i > max_tries:
return
try:
dir(transformers).index(v)
return v
except:
continue
v = find_variable_case('LLaMatokenizer')
exec(f"from transformers import {v}")
vars()[v]
[out]:
transformers.utils.dummy_sentencepiece_objects.LlamaTokenizer
I encountered the same issue identified at the thread today 4/2/2023. The post https://github.com/huggingface/transformers/issues/22222#issuecomment-1477171703 fixed the problem for me.
Thank you.
Hi! I am facing the same problem. I try to import LlamaTokenizer, But:--------------------------------------------------------------------------- ImportError Traceback (most recent call last) Cell In[27], line 1 ----> 1 from transformers import LlamaTokenizer
ImportError: cannot import name 'LlamaTokenizer' from 'transformers' (/usr/local/anaconda3/envs/abc/lib/python3.10/site-packages/transformers/init.py)
and the version of transformers is "transformers 4.28.0.dev0 pypi_0 pypi"
plz tell me how to fix it.
You need to install the library from source to be able to use the LLaMA model.
You need to install the library from source to be able to use the LLaMA model.
Thanks! Where can I get it? And how to install it? Actually I have already installed transformers 4.28.0.dev0, I'm not sure about what you mean.
You can open the documentation at the install page.
Great! I restart my server and it works! thank you !!!
Hi
I installed from source
git clone https://github.com/huggingface/transformers.git cd transformers pip install -e .
pip list show:
transformers 4.29.0.dev0 D:\myfolder\transformers
but I still have
ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.
+1 on @thibaudart comment, I have the same issue.
Hi
I installed from source
git clone https://github.com/huggingface/transformers.git cd transformers pip install -e .
pip list show:
transformers 4.29.0.dev0 D:\myfolder\transformers
but I still have
ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.
hey, try this rep pip install git+https://github.com/mbehm/transformers, maybe it can work
Will this problem be fixed by updating to newest version of transformers, or must we modify the config file manually each time?
You should just using that checkpoint. The maintainers of that repo have made it clear that they are not interested in being compatible with Transformers by ignoring the 62 PRs trying to fix their checkpoints. The huggyllama checkpoints are confirmed to work if you are looking for an alternative (but you should still request the weights to Meta following their official form).
There are now 903 checkpoints for llama on the Hub and only the 4 from decapoda-research do not work since they created them before the PR for Llama was merged into Transformers. We won't break the code for the other 899 checkpoints.
if( "LLaMATokenizer" == tokenizer_class_candidate ): ## add these 2 line to solve it.
tokenizer_class_candidate = 'LlamaTokenizer'
tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate)
@MasterLivens hi, i am currently using colab, which file should i add this code?
@zhiyixu The code being referred to should go into .../site-packages/transformers/models/auto/tokenization_auto.py
However, what worked for me was updating my transformers and tokenizers package. tokenization_auto.py has a mapping of tokenizers at the beginning and I realized that llama wasn't included in the version I had.
Hi @candowu, thanks for raising this issue. This is arising, because the
tokenizerin the config on the hub points toLLaMATokenizer. However, the tokenizer in the library isLlamaTokenizer. This is likely due to the configuration files being created before the final PR was merged in.Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.
Can you please enlighten me on how this could be achieved please? I'm new to this