Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
39%|███▉ | 770/1968 [1:53:40<2:54:12, 8.73s/it]../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [606,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [606,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [606,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [606,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [606,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize
failed.
Traceback (most recent call last):
File "/ML-A800/home/guoshuyue/madehua/code/LLaMA-Factory/src/train_bash.py", line 15, in
main()
File "/ML-A800/home/guoshuyue/madehua/code/LLaMA-Factory/src/train_bash.py", line 6, in main
run_exp()
File "/ML-A800/home/guoshuyue/madehua/code/LLaMA-Factory/src/llmtuner/train/tuner.py", line 40, in run_exp
run_rm(model_args, data_args, training_args, finetuning_args, callbacks)
File "/ML-A800/home/guoshuyue/madehua/code/LLaMA-Factory/src/llmtuner/train/rm/workflow.py", line 50, in run_rm
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train
return inner_training_loop(
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 2118, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/trainer.py", line 3036, in training_step
loss = self.compute_loss(model, inputs)
File "/ML-A800/home/guoshuyue/madehua/code/LLaMA-Factory/src/llmtuner/train/rm/trainer.py", line 51, in compute_loss
_, _, values = model(**inputs, output_hidden_states=True, return_dict=True)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1852, in forward
loss = self.module(*inputs, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/trl/models/modeling_value_head.py", line 170, in forward
base_model_output = self.pretrained_model(
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward
return self.base_model(
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward
outputs = self.model(
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 990, in forward
causal_mask = self._update_causal_mask(attention_mask, inputs_embeds, cache_position)
File "/ML-A800/home/guoshuyue/anaconda3/envs/llama_factory/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1109, in _update_causal_mask
if not is_tracing and torch.any(attention_mask != 1):
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
我阅读了issue#1224,但我使用deepspeed进行启动,每次都运行到相同位置报错,麻烦大家解答一下,谢谢
deepspeed --include localhost:$GPUs --master_port=2221 -- train_bash.py
--deepspeed $ds_config_path
--ddp_timeout 180000000
--stage rm
--do_train
--do_eval
--model_name_or_path $model_path
--adapter_name_or_path $path_to_sft_checkpoint
--create_new_adapter
--dataset $dataset_path
--dataset_dir $dataset_dir
--template default
--finetuning_type lora
--lora_target q_proj,v_proj
--lora_rank 64
--output_dir $path_to_rm_checkpoint
--overwrite_output_dir
--overwrite_cache
--per_device_train_batch_size 2
--per_device_eval_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 10
--save_strategy "epoch"
--dataloader_num_workers 4
--learning_rate 5e-5
--num_train_epochs 2.0
--plot_loss
--val_size 1000
--seed 0
--bf16
Expected behavior
No response
System Info
No response
Others
No response
什么模型?用 --resize_vocab
参数试试?
使用的模型是llama-2-7b-ms,--resize_vocab
对于解决这个问题是有帮助的吗?