HiDe-Prompt icon indicating copy to clipboard operation
HiDe-Prompt copied to clipboard

Index out of bounds when running ImageNetR

Open JACK-Chen-2019 opened this issue 6 months ago • 0 comments

When I was training on the ImageNetR dataset using two GPUs, an issue occurred during the second phase of training (it appeared suddenly after a period of normal training). Has anyone encountered this situation? How can it be resolved?

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [83,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Traceback (most recent call last): File "main.py", line 132, in main(args) File "main.py", line 115, in main hideprompt_trainer.train(args) File "/defaultShare/archive/liangzichen/Prompt/HiDe-Prompt/trainers/hideprompt_trainer.py", line 127, in train train_and_evaluate(model, model_without_ddp, original_model, File "/defaultShare/archive/liangzichen/Prompt/HiDe-Prompt/engines/hide_promtp_wtp_and_tap_engine.py", line 306, in train_and_evaluate train_stats = train_one_epoch(model=model, original_model=original_model, criterion=criterion, File "/defaultShare/archive/liangzichen/Prompt/HiDe-Prompt/engines/hide_promtp_wtp_and_tap_engine.py", line 75, in train_one_epoch loss += orth_loss(output['pre_logits'], target, device, args) File "/defaultShare/archive/liangzichen/Prompt/HiDe-Prompt/engines/hide_promtp_wtp_and_tap_engine.py", line 539, in orth_loss sim = torch.matmul(M, M.t()) / 0.8 RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasDgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

JACK-Chen-2019 avatar Aug 29 '24 05:08 JACK-Chen-2019