Mister K

Results 31 comments of Mister K

https://github.com/comfyanonymous/ComfyUI/issues/3265#issuecomment-2054219986 this is the only other person who indicated a possibility on how the flash_attn is being used.

> [#3265 (comment)](https://github.com/comfyanonymous/ComfyUI/issues/3265#issuecomment-2054219986) > > this is the only other person who indicated a possibility on how the flash_attn is being used. 2 things i gather: 1. Where did my...

let's take a look: ``` ddd insightface ven Sun Oct 27 09:00:42 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 565.90 CUDA Version: 12.7 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M |...

```   on    bash   ven 3.12.3    master ≡  ?1 -1     202ms  ╭─ 09:23:56 | ...

```   on    bash   ven 3.12.3    master ≡  ?1 -1     279ms  ╭─ 09:26:16 | ...

from here... my assumption is: torch, accelerate and xformers will somehow figure out where flash-attn is, and somehow would use it automatically. contributing to the speed gain that i am...

https://huggingface.co/docs/transformers/en/perf_infer_gpu_one i feel dumb. huggingface have descriptions on support for flash_attn. ``` FlashAttention-2 can only be used when the model’s dtype is fp16 or bf16. Make sure to cast your...

> FlashAttention-2 can only be used when the model’s dtype is fp16 or bf16. Make sure to cast your model to the appropriate dtype and load them on a supported...

additionally... https://discuss.pytorch.org/t/flash-attention/174955/17

> https://huggingface.co/docs/transformers/en/perf_infer_gpu_one > > i feel dumb. > > huggingface have descriptions on support for flash_attn. > > ``` > FlashAttention-2 can only be used when the model’s dtype is...