ComfyUI-LTXVideo
ComfyUI-LTXVideo copied to clipboard
I encountered this problem when running the fp8 model, using a modified 4080, please help
Same Problem here:
.../LTX-Video-Q8-Kernels\csrc\gemm\mma_sm89_fp16.hpp:80: block: [12,5,0], thread: [255,0,0] Assertion 0 && "Attempting to use SM89_16x8x32_F32E4M3E4M3F32_TN without CUTE_ARCH_MMA_F16_SM89_ENABLED" failed.
Running a 4090. ChatGPT said:
„That error message means that the kernel code you're using tries to use a specific CUDA tensor core instruction (SM89 for FP8/FP16) — but the corresponding architecture macro (CUTE_ARCH_MMA_F16_SM89_ENABLED) is not defined, so it fails intentionally.
You're likely running on an Ada Lovelace GPU (e.g. RTX 40-series) with compute capability 8.9 (SM89), and the CUTLASS-based kernel you compiled expects this feature to be explicitly enabled.“
ChatGPT suggested adding code corrections inside of the setup.py (LTX-Video-Q8-Kernels folder):
"-DCUTE_ARCH_MMA_SM89_ENABLED",
"-DCUTE_ARCH_MMA_F16_SM89_ENABLED",
"-DCUTE_USE_DEVICE_ATOMICS=1",
... that didnt work neither so I got stuck with the following statement from Chatty:
Your issue is that:
- The GEMM-related CUDA code (in mma_sm89_fp16.hpp) uses inline PTX assembly or syntax not supported under Windows NVCC/MSVC.
- That means the Windows toolchain can’t compile this — even though the hardware (RTX 4090) fully supports it.
What You Can Do
Use a Linux or WSL2 Environment
Can anyone make sense of this statement?
Chances are your torch version or the cuda toolkit version you used to compile the q8 kernels are not actually Cuda 12.8 (or higher, as recommended in the readme)
-
Check that your environment variable
CUDA_PATH_V12_8is set, and thatCUDA_PATHhas the same value. On windows you can check that with runningenv | grep CUDA_PATHin the console. If not, download Cuda 12.8 (you can get higher but to be safe let's get the version matching torch). -
Do install torch with the same Cuda 12.8 version. I recommend using uv pip which is a way faster replacement for pip. With your comfyui python executable:
python -m pip install uv
python -m uv pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
- Install the q8 kernels with Cuda 12.8
python -m uv pip install -U packaging wheel ninja setuptools
# A bit excessive but sure way to rebuild the package (uv might have a --force-rebuild flag?)
python -m uv pip uninstall q8_kernels
python -m uv cache clean
# NOTE: this line should take 2-3 minutes to run at least! Otherwise, you're using a cached and likely broken build
python -m uv pip install --no-build-isolation git+https://github.com/Lightricks/LTX-Video-Q8-Kernels.git
The fp8 model should then run fine.
Hey @CorentinJ thanks for taking the time and thank you for taking the trouble to list everything again step by step - this way I was really forced to verify every step again ... and TADA: there were still older CUDA Toolkit values stored for my environment variables (12.1 instead of 12.8). And maybe your advice for the "clean" reinstallation helped as well.
Thumbs up!