flash-attention
flash-attention copied to clipboard
Apple Silicon Support
More and more models started using flash attention which is awesome. However, it's not available for Apple silicon. Can we have flash attention for Apple Silicon on pip? Thanks so much!
I personally have no bandwidth for that, so we'd need folks to contribute.
how / where would one start with that @tridao ?
Someone with access to Manus AI or to the Pro version of ChatGPT or Delvin could give it a try to do the port using some agent. Definitely a good test for an agent.
There seems to be metal flash attn support already, however its in swift. https://github.com/philipturner/metal-flash-attention