Mister K
Mister K
https://github.com/comfyanonymous/ComfyUI/issues/3265#issuecomment-2054219986 this is the only other person who indicated a possibility on how the flash_attn is being used.
> [#3265 (comment)](https://github.com/comfyanonymous/ComfyUI/issues/3265#issuecomment-2054219986) > > this is the only other person who indicated a possibility on how the flash_attn is being used. 2 things i gather: 1. Where did my...
let's take a look: ``` ddd insightface ven Sun Oct 27 09:00:42 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 565.90 CUDA Version: 12.7 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M |...
``` on bash ven 3.12.3 master ≡ ?1 -1 202ms ╭─ 09:23:56 | ...
``` on bash ven 3.12.3 master ≡ ?1 -1 279ms ╭─ 09:26:16 | ...
from here... my assumption is: torch, accelerate and xformers will somehow figure out where flash-attn is, and somehow would use it automatically. contributing to the speed gain that i am...
https://huggingface.co/docs/transformers/en/perf_infer_gpu_one i feel dumb. huggingface have descriptions on support for flash_attn. ``` FlashAttention-2 can only be used when the model’s dtype is fp16 or bf16. Make sure to cast your...
> FlashAttention-2 can only be used when the model’s dtype is fp16 or bf16. Make sure to cast your model to the appropriate dtype and load them on a supported...
additionally... https://discuss.pytorch.org/t/flash-attention/174955/17
> https://huggingface.co/docs/transformers/en/perf_infer_gpu_one > > i feel dumb. > > huggingface have descriptions on support for flash_attn. > > ``` > FlashAttention-2 can only be used when the model’s dtype is...