moondream
moondream copied to clipboard
Running with Flash Attention 1
Hello, Please let me know how do I run Moondream2 using Flash Attention 1 since am trying to run it on kaggle or colab using t4 gpus so flash attention 2 won't work. You have just mentioned to use flash attention 1 but the exact syntax is no where to be found so guess work is giving me errors
As a beginner learner this is so overwhelming with lot of outdated misinformation online, hope you will understand my situation.
Thank you
I don’t think HF transformers supports Flash Attention 1.0, so you would have to edit the attention classes in the model definition.