Torch compile support was added to Sage Attention
After the recent merged pull request, Torch Compile was made compatible with Sage Attention, in theory, giving it even more acceleration.
https://github.com/thu-ml/SageAttention/pull/218.
Now with the new version update, the compile disable statements can be, in principle, removed
https://github.com/kijai/ComfyUI-WanVideoWrapper/blob/4c4e7defc20e89d1e0e3f95ce2b9ec9cd743db74/wanvideo/modules/attention.py#L21
https://github.com/kijai/ComfyUI-WanVideoWrapper/blob/4c4e7defc20e89d1e0e3f95ce2b9ec9cd743db74/wanvideo/modules/attention.py#L47
https://github.com/thu-ml/SageAttention/issues/274
May be related
After compiling and installing the latest sageattention, I ran into the same problem: launching Comfy with the official workflow and the "--use-sage-attention" flag produces black images. Everything went back to normal once I added the “Patch Sage Attention KJ” node from the KJNODE pack. However, I heard that sageattention is now compatible with torch.compile, so I modified the “Patch Sage Attention KJ” node by commenting out every “@torch.compiler.disable()”—and the black images returned.
I also compiled and installed sageattn3; when I set “Patch Sage Attention KJ” to sageattn3, Comfy crashes.
torch : 2.9.0.dev20250905+cu129 torchaudio: 2.8.0.dev20250901+cu129 triton-windows: 3.4.0.post20 sageattention: 2.2.0 latest sageattn3: 1.0.0 latest
Windows fatal exception: code 0x80000003
Stack (most recent call first): File "D:\sd-comfyui-aki\comfy\ldm\wan\model.py", line 235 in forward File "D:\sd-comfyui-aki.ext\Lib\site-packages\torch\nn\modules\module.py", line 1786 in _call_impl File "D:\sd-comfyui-aki.ext\Lib\site-packages\torch\nn\modules\module.py", line 1775 in _wrapped_call_impl File "D:\sd-comfyui-aki.ext\Lib\site-packages\torch_dynamo\eval_frame.py", line 832 in compile_wrapper File "D:\sd-comfyui-aki.ext\Lib\site-packages\torch\nn\modules\module.py", line 1786 in _call_impl File "D:\sd-comfyui-aki.ext\Lib\site-packages\torch\nn\modules\module.py", line 1775 in _wrapped_call_impl File "D:\sd-comfyui-aki.ext\Lib\site-packages\torch_dynamo\eval_frame.py", line 414 in call File "D:\sd-comfyui-aki\comfy\ldm\wan\model.py", line 579 in forward_orig File "D:\sd-comfyui-aki\comfy\ldm\wan\model.py", line 634 in _forward File "D:\sd-comfyui-aki\comfy\patcher_extension.py", line 112 in execute File "D:\sd-comfyui-aki\comfy\ldm\wan\model.py", line 614 in forward File "D:\sd-comfyui-aki.ext\Lib\site-packages\torch\nn\modules\module.py", line 1786 in _call_impl File "D:\sd-comfyui-aki.ext\Lib\site-packages\torch\nn\modules\module.py", line 1775 in _wrapped_call_impl File "D:\sd-comfyui-aki\comfy\model_base.py", line 199 in _apply_model File "D:\sd-comfyui-aki\comfy\patcher_extension.py", line 112 in execute File "D:\sd-comfyui-aki\comfy\patcher_extension.py", line 105 in call File "D:\sd-comfyui-aki\comfy_api\torch_helpers\torch_compile.py", line 26 in apply_torch_compile_wrapper File "D:\sd-comfyui-aki\comfy\patcher_extension.py", line 113 in execute File "D:\sd-comfyui-aki\comfy\model_base.py", line 160 in apply_model File "D:\sd-comfyui-aki\comfy\samplers.py", line 333 in _calc_cond_batch File "D:\sd-comfyui-aki\comfy\patcher_extension.py", line 112 in execute File "D:\sd-comfyui-aki\comfy\samplers.py", line 214 in _calc_cond_batch_outer File "D:\sd-comfyui-aki\comfy\samplers.py", line 206 in calc_cond_batch File "D:\sd-comfyui-aki\comfy\samplers.py", line 388 in sampling_function File "D:\sd-comfyui-aki\comfy\samplers.py", line 970 in predict_noise File "D:\sd-comfyui-aki\comfy\patcher_extension.py", line 112 in execute File "D:\sd-comfyui-aki\comfy\samplers.py", line 967 in outer_predict_noise File "D:\sd-comfyui-aki\comfy\samplers.py", line 960 in call File "D:\sd-comfyui-aki\comfy\samplers.py", line 408 in call File "D:\sd-comfyui-aki\comfy\k_diffusion\sampling.py", line 199 in sample_euler File "D:\sd-comfyui-aki.ext\Lib\site-packages\torch\utils_contextlib.py", line 120 in decorate_context File "D:\sd-comfyui-aki\comfy\samplers.py", line 759 in sample File "D:\sd-comfyui-aki\comfy\patcher_extension.py", line 112 in execute File "D:\sd-comfyui-aki\comfy\samplers.py", line 987 in inner_sample File "D:\sd-comfyui-aki\comfy\samplers.py", line 1004 in outer_sample File "D:\sd-comfyui-aki\comfy\patcher_extension.py", line 112 in execute File "D:\sd-comfyui-aki\comfy\samplers.py", line 1036 in sample File "D:\sd-comfyui-aki\comfy\samplers.py", line 1051 in sample File "D:\sd-comfyui-aki\comfy\samplers.py", line 1161 in sample File "D:\sd-comfyui-aki\comfy\sample.py", line 45 in sample File "D:\sd-comfyui-aki\nodes.py", line 1492 in common_ksampler File "D:\sd-comfyui-aki\nodes.py", line 1559 in sample File "D:\sd-comfyui-aki\execution.py", line 277 in process_inputs File "D:\sd-comfyui-aki\execution.py", line 289 in _async_map_node_over_list File "D:\sd-comfyui-aki\custom_nodes\ComfyUI-Lora-Manager\py\metadata_collector\metadata_hook.py", line 165 in async_map_node_over_list_with_metadata File "D:\sd-comfyui-aki\execution.py", line 315 in get_output_data File "D:\sd-comfyui-aki\execution.py", line 496 in execute File "D:\sd-comfyui-aki\custom_nodes\ComfyUI-Lora-Manager\py\metadata_collector\metadata_hook.py", line 200 in async_execute_with_prompt_tracking File "D:\sd-comfyui-aki\execution.py", line 695 in execute_async File "D:\sd-comfyui-aki.ext\Lib\asyncio\tasks.py", line 304 in __step_run_and_handle_result File "D:\sd-comfyui-aki.ext\Lib\asyncio\tasks.py", line 293 in __step File "D:\sd-comfyui-aki.ext\Lib\asyncio\events.py", line 89 in _run File "D:\sd-comfyui-aki.ext\Lib\site-packages\nest_asyncio.py", line 133 in _run_once File "D:\sd-comfyui-aki.ext\Lib\site-packages\nest_asyncio.py", line 92 in run_until_complete File "D:\sd-comfyui-aki.ext\Lib\site-packages\nest_asyncio.py", line 30 in run File "D:\sd-comfyui-aki\execution.py", line 649 in execute File "D:\sd-comfyui-aki\main.py", line 195 in prompt_worker File "D:\sd-comfyui-aki.ext\Lib\threading.py", line 994 in run File "D:\sd-comfyui-aki.ext\Lib\threading.py", line 1043 in _bootstrap_inner File "D:\sd-comfyui-aki.ext\Lib\threading.py", line 1014 in _bootstrap
Hi @tuolaku , what's your GPU model? I've seen cases that the sageattn2 Triton kernel gives black image on RTX 30xx (sm86).
You can modify this line:
https://github.com/thu-ml/SageAttention/blob/15c0e22197f0cc9e96757d8fb11b75c284e7ef97/sageattention/core.py#L144
Change it to if arch in {"sm80", "sm86"}:, then sageattn2 will use the CUDA kernel rather than the Triton kernel by default.
Alternatively, you can choose the CUDA kernel in the PatchSageAttentionKJ node.
As for sageattn3, AFAIK I can't make it work on Windows, see https://github.com/woct0rdho/SageAttention/issues/42
嗨,你们的 GPU 型号是什么?我见过 sageattn2 Triton 内核在 RTX 30xx (sm86) 上出现黑色图像的情况。
您可以修改这一行:https://github.com/thu-ml/SageAttention/blob/15c0e22197f0cc9e96757d8fb11b75c284e7ef97/sageattention/core.py#L144 将其更改为 ,则 sageattn2 将默认使用 CUDA 内核而不是 Triton 内核。
if arch in {"sm80", "sm86"}:或者,您可以在节点中选择 CUDA 内核。
PatchSageAttentionKJ至于 sageattn3,AFAIK 我无法让它在 Windows 上运行,请参阅 woct0rdho/SageAttention#42
hi woct0rdho, my GPU is 5090D, I encountered the issue after updating to the latest version of SageAttention. The previous versions worked fine, so I suspect it might be related to their recent update.
嗨,你们的 GPU 型号是什么?我见过 sageattn2 Triton 内核在 RTX 30xx (sm86) 上出现黑色图像的情况。
您可以修改这一行:https://github.com/thu-ml/SageAttention/blob/15c0e22197f0cc9e96757d8fb11b75c284e7ef97/sageattention/core.py#L144 将其更改为 ,则 sageattn2 将默认使用 CUDA 内核而不是 Triton 内核。
if arch in {"sm80", "sm86"}:或者,您可以在节点中选择 CUDA 内核。
PatchSageAttentionKJ至于 sageattn3,AFAIK 我无法让它在 Windows 上运行,请参阅 woct0rdho/SageAttention#42
I tried using ComfyUI-SageAttention3 (https://github.com/wallen0322/ComfyUI-SageAttention3) to enable Sage3 in Comfy. Comfy didn’t crash, but the output was a black image. When I disabled the torch.compile node, images generated normally. Does this indicate that Sage3 can run on Windows, but is still incompatible with torch.compile?
On GPUs other than 30xx I don't know any quick fix. It may be related to https://github.com/comfyanonymous/ComfyUI/issues/8689 and people have been debugging that for pretty long time.
在 30xx 以外的 GPU 上,我不知道任何快速修复方法。它可能与 comfyanonymous/ComfyUI#8689 有关,人们已经调试了很长时间。
OK, Thank you for your hard work and dedication.
Is there more information about this? Has anyone managed to fix the black output? I'm using sage with --use-sage-attention startup argument. Previously in Wan2.1 I would get a black image only with the fp8 model but the fp16 always worked. Now it's always black and the only way around it is to use Kijai's patch.
@kabachuha @kijai I'm getting x2 times speed acceleration when sage2 is combined with torch compile but the output is black. I'm on pytorch 2.9.0, cuda 13 and Blackwell gpu. Also i couldn't enable fullgraph.
So it's either use the patch with the slower speed or get double speed but black screen :)) Oh and also, torch compile is broken in the latest Comfy, so I'm testing this with the previous commit version 0.3.64.
@boyan-orion could you try this? https://github.com/pytorch/pytorch/issues/161861#issuecomment-3250307094
Oh and also, torch compile is broken in the latest Comfy
Is there a GitHub issue for this?
@StrongerXi Here is what I tried with as per your recommendation:
$ export TORCHINDUCTOR_EMULATE_PRECISION_CASTS=1
$ echo $TORCHINDUCTOR_EMULATE_PRECISION_CASTS
$ 1
Then I started Comfy with:
$ python3 main.py --use-sage-attention
Enabled torch compile node (TorchCompileModelWanVideo V2) and again I got a black screen and the following error:
If I don't use the --use-sage-attention argument and load sageattention via the KJ model patcher or model loader node, then it works fine and I still do get the acceleration, however when I use the comfy startup argument flag, I get the black screen but the speed is nearly x 2 times faster than with the typical compile + sage.
So per my understanding, sageattention 2++ and torch compile do provide speedup on their own independently, however they don't cooperate together for ever greater acceleration since Sage was recently patched to support Torch. Is this right?
I suppose when it works via the KJ node, it works fine because in the code the function to use torch compile is disabled, correct?
Anyway here are my specs:
OS: Linux Python: 3.13 Pytorch: 2.9.0 Cuda: 13 ComfyUI: 0.3.66 GPU: RTX 5080 Model: Wan2.2, FP16
UPDATE: After disabling the "disable.torch.compiler()" function inside kj-nodes, it essentially became the same as running --use-sage-attention startup argument, but the black screen was gone and everything was normal only if when I used Triton instead of CUDA.
Works well now.
Compiled from the stable branch stable_abi3 from @woct0rdho repo and silenced the annoying torch.compiler.disable() function. The good old torch compile functions are back, memory utilization is gold and using fullgraph now. Noticed better speed boost as well by 5s/it in Wan2.2. VRAM consumption is at only 8GB with the FP16 model at 1280 x 720 resolution :))
Can push the GPU at lot higher and with more frames now. Now everything is working as it always has months ago.
Compiled from the stable branch stable_abi3 from @woct0rdho repo and silenced the annoying torch.compiler.disable() function.
Do you mind sharing a link to the branch?
Do you mind sharing a link to the branch?
@StrongerXi Sure. It's this one: https://github.com/woct0rdho/SageAttention.git ( abi3_stable branch for pytorch >= 2.9 )
Also, the torch.compiler.disable() function seems to be removed from latest comfy 0.3.68, so I only had to comment it in the Diffusion Loader KJ node. The GGUF loader KJ works fine.