simple-llm-finetuner icon indicating copy to clipboard operation
simple-llm-finetuner copied to clipboard

`LLaMATokenizer` vs `LlamaTokenizer` class names

Open vadi2 opened this issue 1 year ago • 5 comments

Running inference gives the following warning:

Loading tokenizer...
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.

Is it a problem?

vadi2 avatar Mar 25 '23 11:03 vadi2

No this happens because the decapoda-research version of the llama model on huggingface used a pre-merge version of transformers where the capitalization was different. That's why you can't use AutoModel with llama

lxe avatar Mar 31 '23 03:03 lxe

Still happening and seems to lead to crash :(

Gitterman69 avatar Apr 07 '23 10:04 Gitterman69

I was wasn't getting a crash due to this, it was just a warning.

vadi2 avatar Apr 07 '23 10:04 vadi2

I was wasn't getting a crash due to this, it was just a warning.

i tried to use another llama7b_hf model but the same happens.... unfortunately its impossible to find out why or how it crashed.... verbose is not possible, is it?

Gitterman69 avatar Apr 07 '23 10:04 Gitterman69

Solution is to change the word LLaMATokenizer to LlamaTokenizer in the directory ./cache/huggingface/hub and all the subfiles (only in two if I remember well)

mums77 avatar May 20 '23 18:05 mums77