LTX-Video UnboundLocalError: local variable 'self_attn

  File "/lib/python3.10/site-packages/q8_kernels/integration/utils.py", line 104, in get_attention_func
    (self_attn_func, self_attn_memory_layout),
UnboundLocalError: local variable 'self_attn_func' referenced before assignment

This kernel file does not throw an error or use the defaults if not using one of the GPU architectures specified in the code.

Jul 26 '25 18:07 mashdragon

I also met the same problem in RTX 3090, have you solved it？

Sep 17 '25 03:09 fjyu95

Unfortunately, no. I have been using Wan2GP https://github.com/deepbeepmeep/Wan2GP for inference and it works well on 24G cards

Sep 23 '25 15:09 mashdragon

Reproducing on RTX 3050 4GB VRAM Laptop (Windows 11), PyTorch 2.8.0+cu128, ComfyUI v0.3.61-1-g6e079abc(2025-09-29) LTXVideo custom node version nightly, trying to use Q8-Kernels to accelerate FP8 inference. Steps to reproduce:

Installed LTXVideo via ComfyUI Manager. Downloaded ltxv-2b-0.9.8-distilled-fp8.safetensors to models/checkpoints. Loaded ltxvideo-i2v-distilled.json workflow. Selected ltxv-2b-0.9.8-distilled-fp8.safetensors in LTXV Loader. Added LTXVideo Q8 Patcher after loader. Input 512x512 image and simple prompt (e.g., "animate a waving hand"). Queued prompt (4-8 steps, 14 frames).

Full Error Log

got prompt model weight dtype torch.float16, manual cast: None model_type FLUX VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16 no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded. Requested to load MochiTEModel_ loaded completely 9.5367431640625e+25 9083.38671875 True CLIP/text encoder model load device: cpu, offload device: cpu, current: cpu, dtype: torch.float16 Requested to load VideoVAE loaded partially 1882.2000005722045 1882.1682147979736 0 !!! Exception during processing !!! local variable 'self_attn_func' referenced before assignment Traceback (most recent call last): File "ComfyUI/execution.py", line 496, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) File "ComfyUI/execution.py", line 315, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) File "ComfyUI/execution.py", line 289, in _async_map_node_over_list await process_inputs(input_dict, i) File "ComfyUI/execution.py", line 277, in process_inputs result = f(**inputs) File "ComfyUI/custom_nodes/ComfyUI-LTXVideo/q8_nodes.py", line 62, in patch patcher(transformer, use_fp8_attention, True) File "venv/lib/site-packages/q8_kernels/integration/patch_transformer.py", line 26, in patch_comfyui_native_transformer attn_forward = cross_attn_forward(use_fp8_attention) File "venv/lib/site-packages/q8_kernels/integration/comfyui_native.py", line 66, in cross_attn_forward self_attn, cross_attn, is_out_tuple = get_attention_func(use_fp8_attention) File "venv/lib/site-packages/q8_kernels/integration/utils.py", line 104, in get_attention_func (self_attn_func, self_attn_memory_layout), UnboundLocalError: local variable 'self_attn_func' referenced before assignment

Workaround: Removing or bypassing the Q8 Patcher node allows the base FP8 model to run (slower, but functional). Happy to test fixes or provide more details!

Oct 07 '25 01:10 steveparkour