LightX2V
LightX2V copied to clipboard
[Bug] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
Description
when execute python lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py meet torch.AcceleratorError: CUDA error: an illegal memory access was encountered error
Steps to Reproduce
- docker image: lightx2v/lightx2v:25111101-cu128
- lightx2v commit: 63f0486f11913ce1d3cf0d79ed1b70c3ce2d1545
- rtx5090
- python lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py
Expected Result
Execute without any error.
Actual Result
Traceback:
python lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py
a_global_scale : 700.0, torch.Size([])
b_global_scale : 584.0, torch.Size([])
alpha 2.4461839984724065e-06, torch.Size([]), torch.float32
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1289, in not_close_error_metas
pair.compare()
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 740, in compare
self._compare_values(actual, expected)
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 898, in _compare_values
compare_fn(
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1077, in _compare_regular_values_close
matches = torch.isclose(
^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/wujian/workspace/LightX2V/LightX2V/lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py", line 142, in <module>
test_nvfp4_gemm(torch.bfloat16, (128, 512, 128))
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/wujian/workspace/LightX2V/LightX2V/lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py", line 138, in test_nvfp4_gemm
torch.testing.assert_close(out, expected_out.to(dtype=dtype), atol=1e-1, rtol=1e-1)
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1565, in assert_close
error_metas = not_close_error_metas(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1296, in not_close_error_metas
f"Comparing\n\n"
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 407, in __repr__
body = [
^
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 408, in <listcomp>
f" {name}={value!s},"
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor.py", line 590, in __repr__
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 726, in _str
return _str_intern(self, tensor_contents=tensor_contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 647, in _str_intern
tensor_str = _tensor_str(self, indent)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 379, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 415, in get_summarized_data
return torch.stack([get_summarized_data(x) for x in (start + end)])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 415, in <listcomp>
return torch.stack([get_summarized_data(x) for x in (start + end)])
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 405, in get_summarized_data
return torch.cat(
^^^^^^^^^^
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Environment Information
- Operating System: Ubuntu 22.04.5 LTS
- Commit ID: 63f0486f11913ce1d3cf0d79ed1b70c3ce2d1545
Log Information
python lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py
a_global_scale : 700.0, torch.Size([])
b_global_scale : 584.0, torch.Size([])
alpha 2.4461839984724065e-06, torch.Size([]), torch.float32
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1289, in not_close_error_metas
pair.compare()
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 740, in compare
self._compare_values(actual, expected)
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 898, in _compare_values
compare_fn(
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1077, in _compare_regular_values_close
matches = torch.isclose(
^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/wujian/workspace/LightX2V/LightX2V/lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py", line 142, in <module>
test_nvfp4_gemm(torch.bfloat16, (128, 512, 128))
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/wujian/workspace/LightX2V/LightX2V/lightx2v_kernel/test/nvfp4_nvfp4/test_bench1.py", line 138, in test_nvfp4_gemm
torch.testing.assert_close(out, expected_out.to(dtype=dtype), atol=1e-1, rtol=1e-1)
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1565, in assert_close
error_metas = not_close_error_metas(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 1296, in not_close_error_metas
f"Comparing\n\n"
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 407, in __repr__
body = [
^
File "/opt/conda/lib/python3.11/site-packages/torch/testing/_comparison.py", line 408, in <listcomp>
f" {name}={value!s},"
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor.py", line 590, in __repr__
return torch._tensor_str._str(self, tensor_contents=tensor_contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 726, in _str
return _str_intern(self, tensor_contents=tensor_contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 647, in _str_intern
tensor_str = _tensor_str(self, indent)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 379, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 415, in get_summarized_data
return torch.stack([get_summarized_data(x) for x in (start + end)])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 415, in <listcomp>
return torch.stack([get_summarized_data(x) for x in (start + end)])
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/_tensor_str.py", line 405, in get_summarized_data
return torch.cat(
^^^^^^^^^^
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.