optimum-habana icon indicating copy to clipboard operation
optimum-habana copied to clipboard

Gemma: enabled HPU Graphs and Flash Attention

Open dsmertin opened this issue 6 months ago • 8 comments

What does this PR do?

This PR fixes HPU Graphs usage and Flash Attention for Gemma model. Changes are based on Starcoder 2 and Qwen 2 implementations.

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Did you make sure to update the documentation with your changes?
  • [ ] Did you write any new necessary tests?

dsmertin avatar Jul 30 '24 14:07 dsmertin