optimum-habana
optimum-habana copied to clipboard
Gemma: enabled HPU Graphs and Flash Attention
What does this PR do?
This PR fixes HPU Graphs usage and Flash Attention for Gemma model. Changes are based on Starcoder 2 and Qwen 2 implementations.
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?