transformers icon indicating copy to clipboard operation
transformers copied to clipboard

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

Open candowu opened this issue 2 years ago • 5 comments
trafficstars

System Info

4.27.1

Who can help?

No response

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

i test llama in colab here is my code and output:

!pip install git+https://github.com/huggingface/transformers !pip install sentencepiece

import torch from transformers import pipeline,LlamaTokenizer,LlamaForCausalLM

device = "cuda:0" if torch.cuda.is_available() else "cpu" print(device)

tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")

model = LlamaForCausalLM.from_pretrained("decapoda-research/llama-7b-hf")

generator = pipeline(model="decapoda-research/llama-7b-hf", device=device) generator("I can't believe you did such a ")

ValueError Traceback (most recent call last) in 7 # tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf") 8 # model = LlamaForCausalLM.from_pretrained("decapoda-research/llama-7b-hf") ----> 9 generator = pipeline(model="decapoda-research/llama-7b-hf", device=device) 10 generator("I can't believe you did such a ")

1 frames /usr/local/lib/python3.9/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs) 675 676 if tokenizer_class is None: --> 677 raise ValueError( 678 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported." 679 )

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

Expected behavior

expect output generated info

candowu avatar Mar 17 '23 07:03 candowu

I face the same issue

yhifny avatar Mar 17 '23 08:03 yhifny

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.

This is likely due to the configuration files being created before the final PR was merged in.

amyeroberts avatar Mar 17 '23 08:03 amyeroberts

I cloned the repo and changed the tokenizer in the config file to LlamaTokenizer but I got ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.

yhifny avatar Mar 17 '23 09:03 yhifny

For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it's probably better to try find or save a new model with the new naming.

mbehm avatar Mar 17 '23 10:03 mbehm

@yhifny Are you able to import the tokenizer directly using from transformers import LlamaTokenizer ?

If not, can you make sure that you are working from the development branch in your environment using: pip install git+https://github.com/huggingface/transformers

more details here.

amyeroberts avatar Mar 17 '23 13:03 amyeroberts

I can import the LlamaTokenizer class, but getting error that from_pretrained method is None. Anyone else having this issue?

nadahlberg avatar Mar 17 '23 18:03 nadahlberg

As the error message probably mentions, you need to install sentencepiece: pip install sentencepiece.

sgugger avatar Mar 17 '23 19:03 sgugger

Working now. I swear I had sentencepiece, but probably forgot to reset the runtime 🤦 My bad!

nadahlberg avatar Mar 17 '23 19:03 nadahlberg

For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it's probably better to try find or save a new model with the new naming.

Thanks, man, your link solved all the problem

xhinker avatar Mar 17 '23 22:03 xhinker

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.

This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

nameless0704 avatar Mar 21 '23 01:03 nameless0704

For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it's probably better to try find or save a new model with the new naming.

Thank you so much for this! Works!

vdattwani2005 avatar Mar 26 '23 05:03 vdattwani2005

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer. This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

I assume this is applied to the llama-7b cloned repo from HuggingFace right? How can I instantiate the model and the tokenizer after doing that please?

sarrahbbh avatar Mar 29 '23 13:03 sarrahbbh

you are a life saver. There docs on the site should be updated for this reference.

thekevshow avatar Mar 30 '23 05:03 thekevshow

Thank you so much for this! Works! That's amazing!

RiseInRose avatar Mar 31 '23 06:03 RiseInRose

You can try this for a ather crazy way to find out what is the right casing for the module:

import transformers

from itertools import product
import importlib

def find_variable_case(s, max_tries=1000):
  var_permutations = list(map("".join, product(*zip(s.upper(), s.lower()))))
  # Intuitively, any camel casing should minimize the no. of upper chars.
  # From https://stackoverflow.com/a/58789587/610569
  var_permutations.sort(key=lambda ss: (sum(map(str.isupper, ss)), len(ss)))
  for i, v in enumerate(var_permutations):
    if i > max_tries:
      return
    try:
      dir(transformers).index(v)
      return v
    except:
      continue


v = find_variable_case('LLaMatokenizer')
exec(f"from transformers import {v}")
vars()[v]

[out]:

transformers.utils.dummy_sentencepiece_objects.LlamaTokenizer

alvations avatar Apr 02 '23 01:04 alvations

I encountered the same issue identified at the thread today 4/2/2023. The post https://github.com/huggingface/transformers/issues/22222#issuecomment-1477171703 fixed the problem for me.

Thank you.

FatCache avatar Apr 02 '23 22:04 FatCache

Hi! I am facing the same problem. I try to import LlamaTokenizer, But:--------------------------------------------------------------------------- ImportError Traceback (most recent call last) Cell In[27], line 1 ----> 1 from transformers import LlamaTokenizer

ImportError: cannot import name 'LlamaTokenizer' from 'transformers' (/usr/local/anaconda3/envs/abc/lib/python3.10/site-packages/transformers/init.py)

and the version of transformers is "transformers 4.28.0.dev0 pypi_0 pypi"

plz tell me how to fix it.

qufy6 avatar Apr 12 '23 07:04 qufy6

You need to install the library from source to be able to use the LLaMA model.

sgugger avatar Apr 12 '23 11:04 sgugger

You need to install the library from source to be able to use the LLaMA model.

Thanks! Where can I get it? And how to install it? Actually I have already installed transformers 4.28.0.dev0, I'm not sure about what you mean.

qufy6 avatar Apr 12 '23 12:04 qufy6

You can open the documentation at the install page.

sgugger avatar Apr 12 '23 12:04 sgugger

Great! I restart my server and it works! thank you !!!

qufy6 avatar Apr 12 '23 13:04 qufy6

Hi

I installed from source

git clone https://github.com/huggingface/transformers.git cd transformers pip install -e .

pip list show:

transformers 4.29.0.dev0 D:\myfolder\transformers

but I still have

ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.

thibaudart avatar Apr 18 '23 10:04 thibaudart

+1 on @thibaudart comment, I have the same issue.

mehrdadh avatar Apr 19 '23 20:04 mehrdadh

Hi

I installed from source

git clone https://github.com/huggingface/transformers.git cd transformers pip install -e .

pip list show:

transformers 4.29.0.dev0 D:\myfolder\transformers

but I still have

ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.

hey, try this rep pip install git+https://github.com/mbehm/transformers, maybe it can work

micelvrice avatar Apr 21 '23 07:04 micelvrice

Will this problem be fixed by updating to newest version of transformers, or must we modify the config file manually each time?

CoinCheung avatar Apr 27 '23 11:04 CoinCheung

You should just using that checkpoint. The maintainers of that repo have made it clear that they are not interested in being compatible with Transformers by ignoring the 62 PRs trying to fix their checkpoints. The huggyllama checkpoints are confirmed to work if you are looking for an alternative (but you should still request the weights to Meta following their official form).

There are now 903 checkpoints for llama on the Hub and only the 4 from decapoda-research do not work since they created them before the PR for Llama was merged into Transformers. We won't break the code for the other 899 checkpoints.

sgugger avatar Apr 27 '23 12:04 sgugger

            if( "LLaMATokenizer" == tokenizer_class_candidate ):  ## add these 2 line to solve it.
                tokenizer_class_candidate = 'LlamaTokenizer'  
            tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate)

MasterLivens avatar Apr 27 '23 14:04 MasterLivens

@MasterLivens hi, i am currently using colab, which file should i add this code?

zhiyixu avatar May 04 '23 09:05 zhiyixu

@zhiyixu The code being referred to should go into .../site-packages/transformers/models/auto/tokenization_auto.py

However, what worked for me was updating my transformers and tokenizers package. tokenization_auto.py has a mapping of tokenizers at the beginning and I realized that llama wasn't included in the version I had.

owanr avatar May 06 '23 19:05 owanr

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer. This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

Can you please enlighten me on how this could be achieved please? I'm new to this

sarrahbbh avatar May 15 '23 14:05 sarrahbbh