RWKV-LM-LoRA
RWKV-LM-LoRA copied to clipboard
Indexing.cu:1141: indexSelectLargeIndex: block: [202,0,0], thread: [92,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
在进行3B模型调优时,报了以下错误:
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [202,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [202,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [202,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [202,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [202,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [202,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:1141: indexSelectLargeIndex: block: [202,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize
failed.
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:31 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f56e3a67457 in /opt/conda/envs/rwkv38/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f56e3a313ec in /opt/conda/envs/rwkv38/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(std::string const&, std::string const&, int, bool) + 0xb4 (0x7f570eadbc64 in /opt/conda/envs/rwkv38/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3:
我的配置是:
CUDA_VISIBLE_DEVICES=0 python train.py
--load_model "/usr/local/RWKV-LM-LoRA/RWKV-4-Raven-3B-v12-Eng49%-Chn49%-Jpn1%-Other1%-20230527-ctx4096.pth"
--proj_dir "/usr/local/RWKV-LM-LoRA/modelcheckpoint"
--data_file "/usr/local/RWKV-LM-LoRA/bininx_data/dev_1_text_document"
--data_type binidx
--vocab_size 50277
--ctx_len 4096
--accumulate_grad_batches 4
--epoch_steps 32
--epoch_count 2
--epoch_begin 0
--epoch_save 2
--micro_bsz 2
--n_layer 32
--n_embd 2560
--pre_ffn 0
--head_qk 0
--lr_init 1e-5
--lr_final 1e-5
--warmup_steps 0
--beta1 0.9
--beta2 0.999
--adam_eps 1e-8
--accelerator gpu
--devices 1
--precision bf16
--strategy deepspeed_stage_2
--grad_cp 1
--lora
--lora_r 8
--lora_alpha 16
--lora_dropout 0.01
--lora_parts=att,ffn,time,ln
查看了很多资料,都没有解决,大家遇到过这个问题吗?有什么办法能解决呢?