ComfyUI-LTXVideo CUDA error: an illegal memory access was encountered

ComfyUI 3.34 Pytorch 2.4.0

I followed the steps to install in my environment. I'm using the quantized models and followed the steps to install the f8 kernels. I'm using the example workflow.

When I run, I get the following errors:

Requested to load VideoVAE
loaded completely 2632.487496185303 2378.2250690460205 True
Requested to load LTXV
loaded partially 9930.4475 9930.445434570312 0
  0%|                                                                                                                                                                                                                                                                                                                                                          | 0/8 [00:00<?, ?it/s]
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x78626bd77f86 in /usr/local/lib/python3.11/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x78626bd26d10 in /usr/local/lib/python3.11/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x78626c151f08 in /usr/local/lib/python3.11/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x587d0 (0x78626c1577d0 in /usr/local/lib/python3.11/dist-packages/torch/lib/libc10_cuda.so)
frame #4: <unknown function> + 0x5a4f4 (0x78626c1594f4 in /usr/local/lib/python3.11/dist-packages/torch/lib/libc10_cuda.so)
frame #5: <unknown function> + 0x5de5b0 (0x78626a7d65b0 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x6abdf (0x78626bd5bbdf in /usr/local/lib/python3.11/dist-packages/torch/lib/libc10.so)
frame #7: c10::TensorImpl::~TensorImpl() + 0x21b (0x78626bd54c3b in /usr/local/lib/python3.11/dist-packages/torch/lib/libc10.so)
frame #8: c10::TensorImpl::~TensorImpl() + 0x9 (0x78626bd54de9 in /usr/local/lib/python3.11/dist-packages/torch/lib/libc10.so)
frame #9: <unknown function> + 0x2f698 (0x78600ddbb698 in /usr/local/lib/python3.11/dist-packages/q8_kernels_cuda/ops/_C.cpython-311-x86_64-linux-gnu.so)
frame #10: <unknown function> + 0x84841 (0x78600de10841 in /usr/local/lib/python3.11/dist-packages/q8_kernels_cuda/ops/_C.cpython-311-x86_64-linux-gnu.so)
frame #11: <unknown function> + 0x4ef84 (0x78600dddaf84 in /usr/local/lib/python3.11/dist-packages/q8_kernels_cuda/ops/_C.cpython-311-x86_64-linux-gnu.so)
frame #12: /usr/bin/python() [0x55563b]
frame #13: _PyObject_MakeTpCall + 0x27c (0x52f68c in /usr/bin/python)
frame #14: _PyEval_EvalFrameDefault + 0x6bd (0x53d81d in /usr/bin/python)
frame #15: _PyFunction_Vectorcall + 0x173 (0x566263 in /usr/bin/python)
frame #16: _PyEval_EvalFrameDefault + 0x4929 (0x541a89 in /usr/bin/python)
frame #17: _PyFunction_Vectorcall + 0x173 (0x566263 in /usr/bin/python)
frame #18: <unknown function> + 0x997a50 (0x78626ab8fa50 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_python.so)
frame #19: <unknown function> + 0xcc80dd (0x78626aec00dd in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_python.so)
frame #20: c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const + 0x238 (0x78626aecac58 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_python.so)
frame #21: torch::jit::invokeOperatorFromPython(std::vector<std::shared_ptr<torch::jit::Operator>, std::allocator<std::shared_ptr<torch::jit::Operator> > > const&, pybind11::args, pybind11::kwargs const&, std::optional<c10::DispatchKey>) + 0x1c1 (0x78626ac56991 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_python.so)
frame #22: torch::jit::_get_operation_for_overload_or_packet(std::vector<std::shared_ptr<torch::jit::Operator>, std::allocator<std::shared_ptr<torch::jit::Operator> > > const&, c10::Symbol, pybind11::args, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) + 0x1a9 (0x78626ac56ce9 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_python.so)
frame #23: <unknown function> + 0x93fad3 (0x78626ab37ad3 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_python.so)
frame #24: <unknown function> + 0x4b2534 (0x78626a6aa534 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_python.so)
frame #25: /usr/bin/python() [0x55563b]
frame #26: PyObject_Call + 0x9d (0x57070d in /usr/bin/python)
frame #27: _PyEval_EvalFrameDefault + 0x8a48 (0x545ba8 in /usr/bin/python)
frame #28: _PyFunction_Vectorcall + 0x173 (0x566263 in /usr/bin/python)
frame #29: _PyObject_FastCallDictTstate + 0x59 (0x5342b9 in /usr/bin/python)
frame #30: _PyObject_Call_Prepend + 0xbe (0x56e3ae in /usr/bin/python)
frame #31: /usr/bin/python() [0x659fdb]
frame #32: _PyObject_MakeTpCall + 0x27c (0x52f68c in /usr/bin/python)
frame #33: _PyEval_EvalFrameDefault + 0x6bd (0x53d81d in /usr/bin/python)
frame #34: _PyFunction_Vectorcall + 0x173 (0x566263 in /usr/bin/python)
frame #35: THPFunction_apply(_object*, _object*) + 0xee9 (0x78626aa755d9 in /usr/local/lib/python3.11/dist-packages/torch/lib/libtorch_python.so)
frame #36: /usr/bin/python() [0x555660]
frame #37: PyObject_Call + 0x9d (0x57070d in /usr/bin/python)
frame #38: _PyEval_EvalFrameDefault + 0x8a48 (0x545ba8 in /usr/bin/python)
frame #39: /usr/bin/python() [0x585af7]
frame #40: /usr/bin/python() [0x5852de]
frame #41: PyObject_Call + 0xf4 (0x570764 in /usr/bin/python)
frame #42: _PyEval_EvalFrameDefault + 0x4929 (0x541a89 in /usr/bin/python)
frame #43: /usr/bin/python() [0x585af7]
frame #44: /usr/bin/python() [0x5852de]
frame #45: PyObject_Call + 0xf4 (0x570764 in /usr/bin/python)
frame #46: _PyEval_EvalFrameDefault + 0x4929 (0x541a89 in /usr/bin/python)
frame #47: _PyFunction_Vectorcall + 0x173 (0x566263 in /usr/bin/python)
frame #48: _PyObject_FastCallDictTstate + 0xb8 (0x534318 in /usr/bin/python)
frame #49: _PyObject_Call_Prepend + 0x59 (0x56e349 in /usr/bin/python)
frame #50: /usr/bin/python() [0x659fdb]
frame #51: PyObject_Call + 0x9d (0x57070d in /usr/bin/python)
frame #52: _PyEval_EvalFrameDefault + 0x4929 (0x541a89 in /usr/bin/python)
frame #53: /usr/bin/python() [0x585af7]
frame #54: /usr/bin/python() [0x5852de]
frame #55: PyObject_Call + 0xf4 (0x570764 in /usr/bin/python)
frame #56: _PyEval_EvalFrameDefault + 0x4929 (0x541a89 in /usr/bin/python)
frame #57: /usr/bin/python() [0x585af7]
frame #58: /usr/bin/python() [0x5852de]
frame #59: PyObject_Call + 0xf4 (0x570764 in /usr/bin/python)
frame #60: _PyEval_EvalFrameDefault + 0x4929 (0x541a89 in /usr/bin/python)
frame #61: _PyFunction_Vectorcall + 0x173 (0x566263 in /usr/bin/python)
frame #62: _PyObject_FastCallDictTstate + 0xb8 (0x534318 in /usr/bin/python)

Aborted (core dumped)

May 16 '25 14:05 peterhoang

I'm getting the same error and I'm using the same workflow and model. And then ComfyUI just shuts down in the console in the middle of the process.

May 16 '25 15:05 rtaskf

Reproducible. I could follow the installation instructions of the q8_kernels effortlessly: Cuda and torch versions all match up.... I wonder if this would be a workaround for now.

May 16 '25 22:05 BR14Nx

have the same issue. I want my speed up version, Kijai's slows down for me

May 17 '25 20:05 lijackcoder

So, will this be fixed?

May 20 '25 18:05 rtaskf

Any news on this?

Jun 16 '25 10:06 rtaskf

I'm encountering this issues as well. Is a fix coming soon?

Running on Linux. Tried both Python 3.12 & 3.10. Also tried Kijai's version. All have the error. It seems to be from loading the q8 node.

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_May_27_02:21:03_PDT_2025
Cuda compilation tools, release 12.9, V12.9.86
Build cuda_12.9.r12.9/compiler.36037853_0

python -c "import torch; print(torch.__version__, torch.version.cuda)"
2.7.1+cu128 12.8

Jul 04 '25 21:07 btakita

Sup @peterhoang, @rtaskf, @BR14Nx & @btakita,

I'm curious if this error is machine specific.

My intuition tells me that having a guide for deploying the ComfyUI-LTXVideo Flows on Linux could be a standard solution to make the onboarding experience smoother.

That way:

the machine can be scaled up and down
steps can be added to the deployment scripts when new features come out

I made a little proof of concept here: Deploy on Linux Docs

PR here

Jul 10 '25 09:07 ElishaKay

I moved onto using the kijai 8fp & then the distilled 16fp models. From what I remember, I had a few custom nodes with different versions of pytorch. 2.7.1 was likely used. It also may not have been using cu128. I'm using a 5090 (Blackwell), so it may have been due to the lack of current support.

I tried the distilled fp8 model again & it worked!

Here is my requirements.txt

requirements-freeze.txt

Jul 15 '25 17:07 btakita

0% 0/30 [00:00<?, ?it/s]terminate called after throwing an instance of 'c10::AcceleratorError'

what(): CUDA error: an illegal memory access was encountered

Every time, can't get it to work!

Nov 02 '25 14:11 roger2014jr