Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
报错如下:
Traceback (most recent call last):
File "/home/inspur/scc/gpt/LLaMA-Factory/src/train_bash.py", line 14, in
main()
File "/home/inspur/scc/gpt/LLaMA-Factory/src/train_bash.py", line 5, in main
run_exp()
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/train/tuner.py", line 26, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 29, in run_sft
model, tokenizer = load_model_and_tokenizer(model_args, finetuning_args, training_args.do_train)
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/model/loader.py", line 49, in load_model_and_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained
return cls._from_pretrained(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 69, in init
super().init(padding_side=padding_side, **kwargs)
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init
self._add_tokens(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 108, in get_vocab
vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 104, in vocab_size
return self.tokenizer.n_words
AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'. Did you mean: 'tokenize'?
这个要降transformers的版本才能加载。但是降版本后又会导致其他问题。
Expected Behavior
No response
Steps To Reproduce
我的命令:
CUDA_VISIBLE_DEVICES=5 python src/train_bash.py
--stage sft
--do_train
--model_name_or_path /path/THUDM/chatglm2-6b
--dataset alpaca_gpt4_zh
--template chatglm2
--finetuning_type lora
--lora_target query_key_value
--output_dir /path/chatglm2
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 3.0
--plot_loss
--fp16
Environment
环境:按照requirements,硬件:A100。python:3.10 Transformers: 4.36.2 pytorch:2.1.2
Anything else?
No response
更新下tokenization_chatglm.py,就可以了