TransformerEngine
TransformerEngine copied to clipboard
Flash attention support softcap.
Description
Flash attention had support softcap in commit 8f873cc6, which is used in gemma2.
Fixes # (issue)
Type of change
- [ ] New feature (non-breaking change which adds functionality)
Changes
add softcap args in Flashattention, and update _flash_attn_max_version to 2.6.1
Checklist:
- [ ] I have read and followed the contributing guidelines
- [ ] The functionality is complete
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works