CUDA illegal memory access

Open bhavnasud821 opened this issue 9 months ago • 0 comments

Hello,

I tried running the video text retrieval demo and I'm running into this error:

  File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/utils/checkpoint.py", line 481, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/autograd/function.py", line 574, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/utils/checkpoint.py", line 255, in forward
    outputs = run_function(*args)
              ^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/InternVideoClean/InternVideo2/multi_modality/models/backbones/internvideo2/internvideo2.py", line 305, in _inner_forward
    x = x + self.drop_path2(self.ls2(self.mlp(self.norm2(x))))
                                              ^^^^^^^^^^^^^
  File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/InternVideoClean/InternVideo2/multi_modality/models/backbones/internvideo2/internvideo2.py", line 138, in forward
    return self.weight * hidden_states.to(input_dtype)
           ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I'm using the default internvideo2_stage2_config.py with pretrained='InternVideo2-stage2_1b-224p-f4.pt'. When I turn off deepspeed I get this error instead:

  File "/home/saumya/InternVideoClean/InternVideo2/multi_modality/models/backbones/internvideo2/internvideo2.py", line 302, in _inner_forward
    x = x + self.drop_path1(self.ls1(self.attn(self.norm1(x))))
                                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/InternVideoClean/InternVideo2/multi_modality/models/backbones/internvideo2/internvideo2.py", line 227, in forward
    x = self._naive_attn(x) if not self.use_flash_attn else self._flash_attn(x)
        ^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/InternVideoClean/InternVideo2/multi_modality/models/backbones/internvideo2/internvideo2.py", line 186, in _naive_attn
    qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
          ^^^^^^^^^^^
  File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/nn/modules/linear.py", line 117, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

The only other notable thing I changed was loading the BERT tokenizer: tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-large-uncased") # tokenizer = BertTokenizer.from_pretrained(config.model.text_encoder.pretrained, local_files_only=False) model = InternVideo2_Stage2(config=config, tokenizer=tokenizer, is_pretrain=True)

Apr 05 '25 22:04 bhavnasud821