MedicalGPT icon indicating copy to clipboard operation
MedicalGPT copied to clipboard

ValueError: 130004 is not in list

Open sexan opened this issue 1 year ago • 7 comments

Describe the Question

采用chatglm-6b-v0模型进行全量参数预训练时,--use_peft设为False,启动命令如下: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node 8 pretraining.py
--model_type chatglm
--model_name_or_path /home/vca/lsg/ChatGPT/open-models/chatglm-6b-v0
--train_file_dir ../data/pretrain
--validation_file_dir ../data/pretrain
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--do_train
--do_eval
--use_peft False
--seed 42
--fp16
--max_train_samples 10000
--max_eval_samples 10
--num_train_epochs 0.5
--learning_rate 2e-4
--warmup_ratio 0.05
--weight_decay 0.01
--logging_strategy steps
--logging_steps 10
--eval_steps 50
--evaluation_strategy steps
--save_steps 500
--save_strategy steps
--save_total_limit 3
--gradient_accumulation_steps 1
--preprocessing_num_workers 1
--block_size 16
--output_dir outputs-pt-v1
--overwrite_output_dir
--ddp_timeout 30000
--logging_first_step True
--target_modules all
--lora_rank 8
--lora_alpha 16
--lora_dropout 0.05
--torch_dtype float16
--device_map auto
--report_to tensorboard
--ddp_find_unused_parameters False
--gradient_checkpointing True

报如下错误: attention_mask = self.get_masks( File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-v0/modeling_chatglm.py", line 682, in get_masks context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids] ValueError: 130004 is not in list return forward_call(*input, **kwargs)return inner_training_loop(

File "/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1929, in _inner_training_loop context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids] File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-v0/modeling_chatglm.py", line 682, in outputs = model(**inputs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids] ValueError: 130004 is not in list

Describe your attempts

  • [ ] I walked through the tutorials
  • [ ] I checked the documentation
  • [ ] I checked to make sure that this is not a duplicate question

sexan avatar Jun 13 '23 22:06 sexan