qlora ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.

When I tried

!python qlora.py –learning_rate 0.0001 --model_name_or_path EleutherAI/gpt-neox-20b --trust_remote_code

in colab, i got following errors

2023-06-03 13:54:17.113623: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
loading base model EleutherAI/gpt-neox-20b...
Loading checkpoint shards: 100% 46/46 [04:20<00:00,  5.66s/it]
adding LoRA modules...
trainable params: 138412032.0 || all params: 10865725440 || trainable: 1.2738406907509712
loaded model
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/qlora/qlora.py:790 in <module>                                      │
│                                                                              │
│   787 │   │   │   fout.write(json.dumps(all_metrics))                        │
│   788                                                                        │
│   789 if __name__ == "__main__":                                             │
│ ❱ 790 │   train()                                                            │
│   791                                                                        │
│                                                                              │
│ /content/qlora/qlora.py:635 in train                                         │
│                                                                              │
│   632 │   set_seed(args.seed)                                                │
│   633 │                                                                      │
│   634 │   # Tokenizer                                                        │
│ ❱ 635 │   tokenizer = AutoTokenizer.from_pretrained(                         │
│   636 │   │   args.model_name_or_path,                                       │
│   637 │   │   cache_dir=args.cache_dir,                                      │
│   638 │   │   padding_side="right",                                          │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenizatio │
│ n_auto.py:691 in from_pretrained                                             │
│                                                                              │
│   688 │   │   │   │   tokenizer_class = tokenizer_class_from_name(tokenizer_ │
│   689 │   │   │                                                              │
│   690 │   │   │   if tokenizer_class is None:                                │
│ ❱ 691 │   │   │   │   raise ValueError(                                      │
│   692 │   │   │   │   │   f"Tokenizer class {tokenizer_class_candidate} does │
│   693 │   │   │   │   )                                                      │
│   694 │   │   │   return tokenizer_class.from_pretrained(pretrained_model_na │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently 
imported.

Jun 03 '23 14:06 zhashen

check your config.json (that comes with the model weights) and see if the name is misspelled. It happens often with mixed case names.

Jun 03 '23 15:06 phalexo

tokenizer = AutoTokenizer.from_pretrained( args.model_name_or_path, cache_dir=args.cache_dir, padding_side="right", use_fast=True, # Fast tokenizer giving issues. tokenizer_type='llama' if 'llama' in args.model_name_or_path else None, # Needed for HF name change )

Jun 05 '23 06:06 T-Atlas

I had this issue when I ran python3 qlora.py. And I second @T-Atlas 's solution.

The reason is that the default model in qlora.py is EleutherAI/pythia-12b

https://github.com/artidoro/qlora/blob/3da535abdfaa29a2d0757eab0971664ed2cd97e8/qlora.py#L53-L55

which depends on GPTNeoXTokenizer.

https://huggingface.co/EleutherAI/pythia-12b/blob/main/tokenizer_config.json#L7

GPTNexoXTokenzier has only the fast version.

https://github.com/huggingface/transformers/issues/17756#issuecomment-1534219526

But qlora.py disables the use of fast tokenizers.

Jun 08 '23 17:06 wangkuiyi

it works

Jun 09 '23 13:06 SeekPoint

it works

what works ? can you elaborate?

Jul 13 '23 09:07 pzdkn

I had to change "tokenizer_class": "GPTNeoXTokenizer" to "tokenizer_class":"GPTNeoXTokenizerFast" in tokenizer_config.json.

Jul 14 '23 02:07 WillsonAmalrajA

I had this issue when I ran python3 qlora.py. And I second @T-Atlas 's solution.

The reason is that the default model in qlora.py is EleutherAI/pythia-12b

https://github.com/artidoro/qlora/blob/3da535abdfaa29a2d0757eab0971664ed2cd97e8/qlora.py#L53-L55

which depends on GPTNeoXTokenizer.

https://huggingface.co/EleutherAI/pythia-12b/blob/main/tokenizer_config.json#L7

GPTNexoXTokenzier has only the fast version.

huggingface/transformers#17756 (comment)

But qlora.py disables the use of fast tokenizers.

Enabling fast tokenizers fixed this (in the qlora.py script). Although it was mentioned that setting tokenizer fast to TRUE causes issues, setting it to FALSE results in the error described by the OP.

May 23 '24 05:05 olympus-terminal

qlora qlora copied to clipboard

ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported.

qlora
qlora copied to clipboard