[Bug] no kernel image is available when running T5

Open lhy0807 opened this issue 1 month ago • 1 comments

Description

I face the torch.AcceleratorError: CUDA error: no kernel image is available error when running the provided script bash wan22/run_wan22_moe_i2v_distill.sh

Steps to Reproduce

Pull the official docker image
Setup the code base
run bash wan22/run_wan22_moe_i2v_distill.sh after changing the config_json to wan_moe_i2v_distill_4090.json

Expected Result

Model should create the video

Actual Result

2025-10-31 17:18:53.129 | INFO     | lightx2v.utils.utils:load_weights:384 - Loading weights from /workspace/models/wan2.2_i2v/models_t5_umt5-xxl-enc-fp8.pth
2025-10-31 17:18:57.223 | INFO     | lightx2v.utils.utils:load_weights:384 - Loading weights from /workspace/models/wan2.2_i2v/Wan2.1_VAE.pth
2025-10-31 17:18:57.536 | INFO     | lightx2v.utils.profiler:__exit__:43 - [Profile] Single GPU - Level2_Log Load models cost 24.113897 seconds
2025-10-31 17:19:02.661 | INFO     | lightx2v.utils.profiler:__exit__:43 - [Profile] Single GPU - Level1_Log Run VAE Encoder cost 5.112839 seconds
2025-10-31 17:19:02.753 | INFO     | lightx2v.utils.profiler:__exit__:43 - [Profile] Single GPU - Level1_Log Run Text Encoder cost 0.091370 seconds
2025-10-31 17:19:02.753 | INFO     | lightx2v.utils.profiler:__exit__:43 - [Profile] Single GPU - Level2_Log Run Encoders cost 5.216829 seconds
2025-10-31 17:19:02.753 | INFO     | lightx2v.utils.profiler:__exit__:43 - [Profile] Single GPU - Level1_Log RUN pipeline cost 5.216864 seconds
2025-10-31 17:19:02.753 | INFO     | lightx2v.utils.profiler:__exit__:43 - [Profile] Single GPU - Level1_Log Total Cost cost 29.331168 seconds
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/lightx2v/infer.py", line 115, in <module>
    main()
  File "/workspace/lightx2v/infer.py", line 106, in main
    runner.run_pipeline(input_info)
  File "/workspace/lightx2v/utils/profiler.py", line 77, in sync_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lightx2v/models/runners/default_runner.py", line 364, in run_pipeline
    self.inputs = self.run_input_encoder()
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lightx2v/utils/profiler.py", line 77, in sync_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lightx2v/models/runners/default_runner.py", line 205, in _run_input_encoder_local_i2v
    text_encoder_output = self.run_text_encoder(self.input_info)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lightx2v/utils/profiler.py", line 77, in sync_wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lightx2v/models/runners/wan/wan_runner.py", line 238, in run_text_encoder
    context = self.text_encoders[0].infer([prompt])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lightx2v/models/input_encoders/hf/wan/t5/model.py", line 609, in infer
    context = self.model(ids, mask)
              ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lightx2v/models/input_encoders/hf/wan/t5/model.py", line 351, in forward
    x = block(x, mask, pos_bias=e)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lightx2v/models/input_encoders/hf/wan/t5/model.py", line 220, in forward
    x = fp16_clamp(x + self.attn(self.norm1(x), mask=mask, pos_bias=e))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/lightx2v/models/input_encoders/hf/wan/t5/model.py", line 125, in forward
    attn_bias = x.new_zeros(b, n, q.size(1), k.size(1))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Environment Information

Official docker image
RTX 5090

Oct 31 '25 17:10 lhy0807