Black video with sage attention
Sage works with other models for me (such as Hunyuan) but not with CogVideoX. Debian testing, 3090 Ti, sageattention 2.0.0, torch 2.5.1+cu124, diffusers 0.31.0, transformers 4.47.0. Running 5b-1.5-I2V. The error is:
comfyui-1 | 2024-12-20T09:12:33.054432252Z /app/custom_nodes/ComfyUI-VideoHelperSuite/videohelpersuite/nodes.py:104: RuntimeWarning: invalid value encountered in cast
comfyui-1 | 2024-12-20T09:12:33.054463814Z return tensor_to_int(tensor, 8).astype(np.uint8)
I updated your extension and VHS from master, it still doesn't work. Comfy and sdpa attention work fine (but they're slower), the rest seem to be buggy (black output).
Cog 1.5 requires sageattention 2.0.0, and to use one of the specific modes on 3090, I don't remember which one, but I have exposed them in attention mode selection.
Yes, I use 2.0.0. sageattn_qk_int8_pv_fp8_cuda results in GPU crash with NVRM: Xid (PCI:0000:01:00): 43, pid=1425251, name=python, Ch 00000019 in dmesg:
0%| | 0/20 [00:00<?, ?it/s]terminate called after throwing an instance of 'c10::Error'
comfyui-1 | 2024-12-20T09:41:30.926068905Z what(): CUDA error: unspecified launch failure
comfyui-1 | 2024-12-20T09:41:30.926086065Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
comfyui-1 | 2024-12-20T09:41:30.926088406Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1
comfyui-1 | 2024-12-20T09:41:30.926089895Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
comfyui-1 | 2024-12-20T09:41:30.926091671Z
comfyui-1 | 2024-12-20T09:41:30.926093197Z Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
comfyui-1 | 2024-12-20T09:41:30.926094786Z frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f3b55e6d446 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
comfyui-1 | 2024-12-20T09:41:30.926096444Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f3b55e176e4 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
comfyui-1 | 2024-12-20T09:41:30.926098500Z frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f3b55f59a18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
comfyui-1 | 2024-12-20T09:41:30.926108031Z frame #3: <unknown function> + 0x600eb (0x7f3b55f610eb in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so)
comfyui-1 | 2024-12-20T09:41:30.926109009Z frame #4: <unknown function> + 0x5faf70 (0x7f3b5459af70 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
comfyui-1 | 2024-12-20T09:41:30.926109867Z frame #5: <unknown function> + 0x6f69f (0x7f3b55e4e69f in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
comfyui-1 | 2024-12-20T09:41:30.926110692Z frame #6: c10::TensorImpl::~TensorImpl() + 0x21b (0x7f3b55e4737b in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
comfyui-1 | 2024-12-20T09:41:30.926111502Z frame #7: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f3b55e47529 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
comfyui-1 | 2024-12-20T09:41:30.926112356Z frame #8: <unknown function> + 0x8c1a98 (0x7f3b54861a98 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
comfyui-1 | 2024-12-20T09:41:30.926113250Z frame #9: THPVariable_subclass_dealloc(_object*) + 0x2c6 (0x7f3b54861de6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so)
comfyui-1 | 2024-12-20T09:41:30.926114230Z <omitting python frames>
comfyui-1 | 2024-12-20T09:41:30.926115039Z
sageattn_qk_int8_pv_fp16_cuda — works :tada:
sageattn_qk_int8_pv_fp16_triton — black video, same error
All fused_sageattn modes give this:
comfyui-1 | 2024-12-20T09:48:31.640438551Z !!! Exception during processing !!! 'NoneType' object is not callable
comfyui-1 | 2024-12-20T09:48:31.640539983Z Traceback (most recent call last):
comfyui-1 | 2024-12-20T09:48:31.640541877Z File "/app/execution.py", line 324, in execute
comfyui-1 | 2024-12-20T09:48:31.640543195Z output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
comfyui-1 | 2024-12-20T09:48:31.640544447Z File "/app/execution.py", line 199, in get_output_data
comfyui-1 | 2024-12-20T09:48:31.640545510Z return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
comfyui-1 | 2024-12-20T09:48:31.640546623Z File "/app/execution.py", line 170, in _map_node_over_list
comfyui-1 | 2024-12-20T09:48:31.640547630Z process_inputs(input_dict, i)
comfyui-1 | 2024-12-20T09:48:31.640548600Z File "/app/execution.py", line 159, in process_inputs
comfyui-1 | 2024-12-20T09:48:31.640549621Z results.append(getattr(obj, func)(**inputs))
comfyui-1 | 2024-12-20T09:48:31.640550620Z File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 702, in process
comfyui-1 | 2024-12-20T09:48:31.640551725Z latents = model["pipe"](
comfyui-1 | 2024-12-20T09:48:31.640552823Z File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
comfyui-1 | 2024-12-20T09:48:31.640553921Z return func(*args, **kwargs)
comfyui-1 | 2024-12-20T09:48:31.640554911Z File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 763, in __call__
comfyui-1 | 2024-12-20T09:48:31.640555948Z noise_pred = self.transformer(
comfyui-1 | 2024-12-20T09:48:31.640556916Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
comfyui-1 | 2024-12-20T09:48:31.640565577Z return self._call_impl(*args, **kwargs)
comfyui-1 | 2024-12-20T09:48:31.640566393Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
comfyui-1 | 2024-12-20T09:48:31.640567197Z return forward_call(*args, **kwargs)
comfyui-1 | 2024-12-20T09:48:31.640567913Z File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 653, in forward
comfyui-1 | 2024-12-20T09:48:31.640568716Z hidden_states, encoder_hidden_states = block(
comfyui-1 | 2024-12-20T09:48:31.640569440Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
comfyui-1 | 2024-12-20T09:48:31.640570259Z return self._call_impl(*args, **kwargs)
comfyui-1 | 2024-12-20T09:48:31.640570992Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
comfyui-1 | 2024-12-20T09:48:31.640571783Z return forward_call(*args, **kwargs)
comfyui-1 | 2024-12-20T09:48:31.640572498Z File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 312, in forward
comfyui-1 | 2024-12-20T09:48:31.640573330Z attn_hidden_states, attn_encoder_hidden_states = self.attn1(
comfyui-1 | 2024-12-20T09:48:31.640574311Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
comfyui-1 | 2024-12-20T09:48:31.640575135Z return self._call_impl(*args, **kwargs)
comfyui-1 | 2024-12-20T09:48:31.640575866Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
comfyui-1 | 2024-12-20T09:48:31.640576649Z return forward_call(*args, **kwargs)
comfyui-1 | 2024-12-20T09:48:31.640577371Z File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 495, in forward
comfyui-1 | 2024-12-20T09:48:31.640578185Z return self.processor(
comfyui-1 | 2024-12-20T09:48:31.640578901Z File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 163, in __call__
comfyui-1 | 2024-12-20T09:48:31.640579702Z hidden_states = self.attn_func(query, key, value, attn_mask=attention_mask, is_causal=False)
comfyui-1 | 2024-12-20T09:48:31.640580512Z TypeError: 'NoneType' object is not callable
Unsure what it needs but I'd like to test them and see if they're faster. So far only one sageattn mode works and it seems to be faster than comfy/sdpa.
Cog 1.5 requires sageattention 2.0.0, and to use one of the specific modes on 3090, I don't remember which one, but I have exposed them in attention mode selection.
Thank you kijai. Currently I use version: 1.0.6 (pip show sageattention) on Windows 11. Can I just uninstall 1.0.6 and install 2.0.0. Do you know if version 2.0.0 will it be compatible with Hunyuan and LTX? Appreciate your work.
Cog 1.5 requires sageattention 2.0.0, and to use one of the specific modes on 3090, I don't remember which one, but I have exposed them in attention mode selection.
Thank you kijai. Currently I use version: 1.0.6 (pip show sageattention) on Windows 11. Can I just uninstall 1.0.6 and install 2.0.0. Do you know if version 2.0.0 will it be compatible with Hunyuan and LTX? Appreciate your work.
They just released 2.0.1 which I have not tested yet, installing 2.0.x currently is harder as it's in beta and you have to compile it yourself.
Yes, I use 2.0.0.
sageattn_qk_int8_pv_fp8_cudaresults in GPU crash withNVRM: Xid (PCI:0000:01:00): 43, pid=1425251, name=python, Ch 00000019in dmesg:0%| | 0/20 [00:00<?, ?it/s]terminate called after throwing an instance of 'c10::Error' comfyui-1 | 2024-12-20T09:41:30.926068905Z what(): CUDA error: unspecified launch failure comfyui-1 | 2024-12-20T09:41:30.926086065Z CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. comfyui-1 | 2024-12-20T09:41:30.926088406Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1 comfyui-1 | 2024-12-20T09:41:30.926089895Z Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. comfyui-1 | 2024-12-20T09:41:30.926091671Z comfyui-1 | 2024-12-20T09:41:30.926093197Z Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first): comfyui-1 | 2024-12-20T09:41:30.926094786Z frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f3b55e6d446 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) comfyui-1 | 2024-12-20T09:41:30.926096444Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f3b55e176e4 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) comfyui-1 | 2024-12-20T09:41:30.926098500Z frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f3b55f59a18 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) comfyui-1 | 2024-12-20T09:41:30.926108031Z frame #3: <unknown function> + 0x600eb (0x7f3b55f610eb in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10_cuda.so) comfyui-1 | 2024-12-20T09:41:30.926109009Z frame #4: <unknown function> + 0x5faf70 (0x7f3b5459af70 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so) comfyui-1 | 2024-12-20T09:41:30.926109867Z frame #5: <unknown function> + 0x6f69f (0x7f3b55e4e69f in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) comfyui-1 | 2024-12-20T09:41:30.926110692Z frame #6: c10::TensorImpl::~TensorImpl() + 0x21b (0x7f3b55e4737b in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) comfyui-1 | 2024-12-20T09:41:30.926111502Z frame #7: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f3b55e47529 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so) comfyui-1 | 2024-12-20T09:41:30.926112356Z frame #8: <unknown function> + 0x8c1a98 (0x7f3b54861a98 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so) comfyui-1 | 2024-12-20T09:41:30.926113250Z frame #9: THPVariable_subclass_dealloc(_object*) + 0x2c6 (0x7f3b54861de6 in /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so) comfyui-1 | 2024-12-20T09:41:30.926114230Z <omitting python frames> comfyui-1 | 2024-12-20T09:41:30.926115039Z
sageattn_qk_int8_pv_fp16_cuda— works 🎉sageattn_qk_int8_pv_fp16_triton— black video, same errorAll
fused_sageattnmodes give this:comfyui-1 | 2024-12-20T09:48:31.640438551Z !!! Exception during processing !!! 'NoneType' object is not callable comfyui-1 | 2024-12-20T09:48:31.640539983Z Traceback (most recent call last): comfyui-1 | 2024-12-20T09:48:31.640541877Z File "/app/execution.py", line 324, in execute comfyui-1 | 2024-12-20T09:48:31.640543195Z output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) comfyui-1 | 2024-12-20T09:48:31.640544447Z File "/app/execution.py", line 199, in get_output_data comfyui-1 | 2024-12-20T09:48:31.640545510Z return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) comfyui-1 | 2024-12-20T09:48:31.640546623Z File "/app/execution.py", line 170, in _map_node_over_list comfyui-1 | 2024-12-20T09:48:31.640547630Z process_inputs(input_dict, i) comfyui-1 | 2024-12-20T09:48:31.640548600Z File "/app/execution.py", line 159, in process_inputs comfyui-1 | 2024-12-20T09:48:31.640549621Z results.append(getattr(obj, func)(**inputs)) comfyui-1 | 2024-12-20T09:48:31.640550620Z File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 702, in process comfyui-1 | 2024-12-20T09:48:31.640551725Z latents = model["pipe"]( comfyui-1 | 2024-12-20T09:48:31.640552823Z File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context comfyui-1 | 2024-12-20T09:48:31.640553921Z return func(*args, **kwargs) comfyui-1 | 2024-12-20T09:48:31.640554911Z File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 763, in __call__ comfyui-1 | 2024-12-20T09:48:31.640555948Z noise_pred = self.transformer( comfyui-1 | 2024-12-20T09:48:31.640556916Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl comfyui-1 | 2024-12-20T09:48:31.640565577Z return self._call_impl(*args, **kwargs) comfyui-1 | 2024-12-20T09:48:31.640566393Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl comfyui-1 | 2024-12-20T09:48:31.640567197Z return forward_call(*args, **kwargs) comfyui-1 | 2024-12-20T09:48:31.640567913Z File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 653, in forward comfyui-1 | 2024-12-20T09:48:31.640568716Z hidden_states, encoder_hidden_states = block( comfyui-1 | 2024-12-20T09:48:31.640569440Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl comfyui-1 | 2024-12-20T09:48:31.640570259Z return self._call_impl(*args, **kwargs) comfyui-1 | 2024-12-20T09:48:31.640570992Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl comfyui-1 | 2024-12-20T09:48:31.640571783Z return forward_call(*args, **kwargs) comfyui-1 | 2024-12-20T09:48:31.640572498Z File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 312, in forward comfyui-1 | 2024-12-20T09:48:31.640573330Z attn_hidden_states, attn_encoder_hidden_states = self.attn1( comfyui-1 | 2024-12-20T09:48:31.640574311Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl comfyui-1 | 2024-12-20T09:48:31.640575135Z return self._call_impl(*args, **kwargs) comfyui-1 | 2024-12-20T09:48:31.640575866Z File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl comfyui-1 | 2024-12-20T09:48:31.640576649Z return forward_call(*args, **kwargs) comfyui-1 | 2024-12-20T09:48:31.640577371Z File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 495, in forward comfyui-1 | 2024-12-20T09:48:31.640578185Z return self.processor( comfyui-1 | 2024-12-20T09:48:31.640578901Z File "/app/custom_nodes/ComfyUI-CogVideoXWrapper/custom_cogvideox_transformer_3d.py", line 163, in __call__ comfyui-1 | 2024-12-20T09:48:31.640579702Z hidden_states = self.attn_func(query, key, value, attn_mask=attention_mask, is_causal=False) comfyui-1 | 2024-12-20T09:48:31.640580512Z TypeError: 'NoneType' object is not callableUnsure what it needs but I'd like to test them and see if they're faster. So far only one sageattn mode works and it seems to be faster than comfy/sdpa.
hi, can you show me results you have with sage on and off? (with details on chosen resolution and frames) but more importantly can you do for me a python.exe -m pip install please and show me the libraries you have please? I also have a 3090 so. @rkfg
I complained here: https://www.reddit.com/r/StableDiffusion/comments/1hzkcxa/i_fuing_hate_torchpythoncuda_problems_and/