InternEvo
InternEvo copied to clipboard
[Feature] Should we remove other dependency of flashattention?
Describe the feature
Should we remove other dependency of flash-attention, and only keep the core attention related ops?
If possible, we can only use pip to install flash-attention, avoiding a lot of compiling operations.
To seek whether it is possible, we need to check whether it would reduce the training performance a lot.
Will you implement it?
- [ ] I would like to implement this feature and create a PR!