[Bug] torch.AcceleratorError: CUDA error: an illegal memory access was encountered

Open jianwuai opened this issue 1 month ago • 0 comments

Description

when execute python lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py meet torch.AcceleratorError: CUDA error: an illegal memory access was encountered error

Steps to Reproduce

docker image: lightx2v/lightx2v:25111101-cu128
lightx2v commit: 63f0486f11913ce1d3cf0d79ed1b70c3ce2d1545
rtx5090
python lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py

Expected Result

Execute without any error.

Actual Result

Traceback:

python lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py
a_global_scale : 700.0, torch.Size([])
b_global_scale : 584.0, torch.Size([])
alpha 2.4461839984724065e-06, torch.Size([]), torch.float32
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1289, in not_close_error_metas
    pair.compare()
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 740, in compare
    self._compare_values(actual, expected)
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 898, in _compare_values
    compare_fn(
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1077, in _compare_regular_values_close
    matches = torch.isclose(
              ^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wujian/workspace/LightX2V/LightX2V/lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py", line 142, in <module>
    test_nvfp4_gemm(torch.bfloat16, (128, 512, 128))
  File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/wujian/workspace/LightX2V/LightX2V/lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py", line 138, in test_nvfp4_gemm
    torch.testing.assert_close(out, expected_out.to(dtype=dtype), atol=1e-1, rtol=1e-1)
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1565, in assert_close
    error_metas = not_close_error_metas(
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1296, in not_close_error_metas
    f"Comparing\n\n"
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 407, in __repr__
    body = [
           ^
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 408, in <listcomp>
    f"    {name}={value!s},"
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor.py", line 590, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 726, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 647, in _str_intern
    tensor_str = _tensor_str(self, indent)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 379, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 415, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 415, in <listcomp>
    return torch.stack([get_summarized_data(x) for x in (start + end)])
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 405, in get_summarized_data
    return torch.cat(
           ^^^^^^^^^^
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Environment Information

Operating System: Ubuntu 22.04.5 LTS
Commit ID: 63f0486f11913ce1d3cf0d79ed1b70c3ce2d1545

Log Information

python lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py
a_global_scale : 700.0, torch.Size([])
b_global_scale : 584.0, torch.Size([])
alpha 2.4461839984724065e-06, torch.Size([]), torch.float32
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1289, in not_close_error_metas
    pair.compare()
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 740, in compare
    self._compare_values(actual, expected)
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 898, in _compare_values
    compare_fn(
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1077, in _compare_regular_values_close
    matches = torch.isclose(
              ^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wujian/workspace/LightX2V/LightX2V/lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py", line 142, in <module>
    test_nvfp4_gemm(torch.bfloat16, (128, 512, 128))
  File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/wujian/workspace/LightX2V/LightX2V/lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py", line 138, in test_nvfp4_gemm
    torch.testing.assert_close(out, expected_out.to(dtype=dtype), atol=1e-1, rtol=1e-1)
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1565, in assert_close
    error_metas = not_close_error_metas(
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1296, in not_close_error_metas
    f"Comparing\n\n"
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 407, in __repr__
    body = [
           ^
  File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 408, in <listcomp>
    f"    {name}={value!s},"
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor.py", line 590, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 726, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 647, in _str_intern
    tensor_str = _tensor_str(self, indent)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 379, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 415, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 415, in <listcomp>
    return torch.stack([get_summarized_data(x) for x in (start + end)])
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 405, in get_summarized_data
    return torch.cat(
           ^^^^^^^^^^
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Nov 14 '25 15:11 jianwuai