moondream icon indicating copy to clipboard operation
moondream copied to clipboard

Running with Flash Attention 1

Open Bikram9035 opened this issue 1 year ago • 1 comments

Hello, Please let me know how do I run Moondream2 using Flash Attention 1 since am trying to run it on kaggle or colab using t4 gpus so flash attention 2 won't work. You have just mentioned to use flash attention 1 but the exact syntax is no where to be found so guess work is giving me errors

As a beginner learner this is so overwhelming with lot of outdated misinformation online, hope you will understand my situation.

Thank you

Bikram9035 avatar Jun 21 '24 17:06 Bikram9035

I don’t think HF transformers supports Flash Attention 1.0, so you would have to edit the attention classes in the model definition.

vikhyat avatar Jul 10 '24 03:07 vikhyat