Bug | Warning | UserWarning: 1Torch was not compiled with flash attention
I installed Comfy UI, open it, load default Workflow, load a XL Model, then Start, then this warning appears. It reduces my generation speed by tenfold.
got prompt model_type EPS adm 2816 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE clip missing: ['clip_l.logit_scale', 'clip_l.transformer.text_projection.weight'] clip unexpected: ['clip_l.transformer.text_model.embeddings.position_ids'] left over keys: dict_keys(['denoiser.sigmas']) Requested to load SDXLClipModel Loading 1 new model C:\Users\nicol\Desktop\ComfyUI_windows_portable_nvidia_cu121_or_cpu\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py:344: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False) Requested to load SDXL Loading 1 new model
D:\AI\ComfyUI>call conda activate D:\AI\ComfyUI\venv-comfyui Total VRAM 8188 MB, total RAM 65268 MB Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 4060 Laptop GPU : cudaMallocAsync VAE dtype: torch.bfloat16 Using pytorch cross attention ****** User settings have been changed to be stored on the server instead of browser storage. ****** ****** For multi-user setups add the --multi-user CLI argument to enable multiple user profiles. ******
Import times for custom nodes: 0.0 seconds: D:\AI\ComfyUI\custom_nodes\websocket_image_save.py
Starting server
To see the GUI go to: http://127.0.0.1:8188 got prompt model_type EPS Using pytorch attention in VAE Using pytorch attention in VAE clip missing: ['clip_l.logit_scale', 'clip_l.transformer.text_projection.weight'] Requested to load SDXLClipModel Loading 1 new model D:\AI\ComfyUI\comfy\ldm\modules\attention.py:345: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False) Requested to load SDXL Loading 1 new model 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.71it/s]
Check this: https://discuss.pytorch.org/t/flash-attention-compilation-warning/196692/12 and this: https://github.com/pytorch/pytorch/issues/108175
Not supported on Windows.
Did you solve the problem? I have the same problem and the first images are always produced in a very long time
nop, but it don't seem to do something bad...
Well, somehow after manually updating to torch 2.4.0+cu121 all is okay for me. I don't remember exactly what else I updated.
Edit: Yesterday I installed fresh version just for Flux usage, with git clone (not portable), and well...
UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
I tried changing torch versions from cu124 to cu121, to older 2.3 - didn't help.
2.4.1+cu 12.4 has this problem too
The warning disappeared after installing pytorch nightly (currently at version 2.6.0.dev20240915+cu124).
However, I don't see any difference in performance