ChatGLM2-6B icon indicating copy to clipboard operation
ChatGLM2-6B copied to clipboard

当我的transformers是4.36.2时,chatglm2不能正常加载

Open Congcong-Song opened this issue 2 years ago • 2 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

报错如下: Traceback (most recent call last): File "/home/inspur/scc/gpt/LLaMA-Factory/src/train_bash.py", line 14, in main() File "/home/inspur/scc/gpt/LLaMA-Factory/src/train_bash.py", line 5, in main run_exp() File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/train/tuner.py", line 26, in run_exp run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 29, in run_sft model, tokenizer = load_model_and_tokenizer(model_args, finetuning_args, training_args.do_train) File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/model/loader.py", line 49, in load_model_and_tokenizer tokenizer = AutoTokenizer.from_pretrained( File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained return cls._from_pretrained( File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 69, in init super().init(padding_side=padding_side, **kwargs) File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init self._add_tokens( File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens current_vocab = self.get_vocab().copy() File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 108, in get_vocab vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)} File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 104, in vocab_size return self.tokenizer.n_words AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'. Did you mean: 'tokenize'?

这个要降transformers的版本才能加载。但是降版本后又会导致其他问题。

Expected Behavior

No response

Steps To Reproduce

我的命令: CUDA_VISIBLE_DEVICES=5 python src/train_bash.py
--stage sft
--do_train
--model_name_or_path /path/THUDM/chatglm2-6b
--dataset alpaca_gpt4_zh
--template chatglm2
--finetuning_type lora
--lora_target query_key_value
--output_dir /path/chatglm2
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 3.0
--plot_loss
--fp16

Environment

环境:按照requirements,硬件:A100。python:3.10 Transformers: 4.36.2 pytorch:2.1.2

Anything else?

No response

Congcong-Song avatar Jan 05 '24 03:01 Congcong-Song

去hugingface 下载最新的模型

Gaojun123123 avatar Jan 06 '24 02:01 Gaojun123123

更新下tokenization_chatglm.py,就可以了

mawenju203 avatar Jan 16 '24 09:01 mawenju203