GLM-4 icon indicating copy to clipboard operation
GLM-4 copied to clipboard

GLM-4V-9B fine-tuning error

Open tw-repository opened this issue 5 months ago • 5 comments

System Info / 系統信息

Traceback (most recent call last): File "/home/sa/swift/swift/cli/sft.py", line 5, in sft_main() File "/home/sa/swift/swift/utils/run_utils.py", line 32, in x_main result = llm_x(args, **kwargs) File "/home/sa/swift/swift/llm/sft.py", line 417, in llm_sft trainer.train(training_args.resume_from_checkpoint) File "/home/sa/swift/swift/trainers/mixin.py", line 552, in train res = super().train(resume_from_checkpoint, *args, **kwargs) File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train return inner_training_loop( File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/transformers/trainer.py", line 3307, in training_step loss = self.compute_loss(model, inputs) File "/home/sa/swift/swift/trainers/trainers.py", line 165, in compute_loss outputs = model(**inputs) File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl result = forward_call(*args, **kwargs) File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/accelerate/utils/operations.py", line 819, in forward return model_forward(*args, **kwargs) File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/accelerate/utils/operations.py", line 807, in call return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/peft/peft_model.py", line 1577, in forward return self.base_model( File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/home/sa/anaconda3/envs/glm4v/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 188, in forward return self.model.forward(*args, **kwargs) File "/home/sa/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 1198, in forward boi_token_pos, eoi_token_pos = input_id.index(self.config.boi_token_id), input_id.index( ValueError: 151339 is not in list

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • [X] The official example scripts / 官方的示例脚本
  • [ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

CUDA_VISIBLE_DEVICES=0 swift sft
--model_id_or_path /mnt/disk/tw_data/glm-4v-9b
--model_type glm4v-9b-chat
--dataset /mnt/disk/tw_data/finetune_mllm/SRR_train.json
--num_train_epochs 5
--sft_type lora
--output_dir /mnt/disk/tw_data/finetune_output
--eval_steps 500
--batch_size 1
--max_length 4096
--lora_rank 8
--lora_alpha 32
--lora_dropout_p 0.05
--gradient_checkpointing true
--weight_decay 0.1
--learning_rate 1e-4
--gradient_accumulation_steps $(expr 16 / 3)
--max_grad_norm 0.5
--warmup_ratio 0.03
--eval_steps 100
--save_steps 300
--save_total_limit 2
--logging_steps 10
--deepspeed default-zero2

Expected behavior / 期待表现

Find the cause of the error and fix it to achieve fine-tuning.

tw-repository avatar Aug 28 '24 17:08 tw-repository