axolotl
axolotl copied to clipboard
Apply unsloth optimizations
⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous Ideas in Discussions didn't find any similar feature requests.
- [X] I searched previous Issues didn't find any similar feature requests.
🔖 Feature description
The project https://github.com/unslothai/unsloth looks very interesting. He claims great speedups for finetuning. He detail the improvements here:
So in GPUs the goal is to saturate the GPU with matrix multiplies instead of data movement. I'll write a more detailed blog but approximately:
1. Flash Attention v2 reduces the time taken by 17% or so
2. RoPE Triton kernels: -7.1%
3. RMS Layernorm in Triton: -3.1%
4. Cross Entropy in Triton: -1%
5. Manual autograd for MLP: -4%
6. Manual QKV autograd: -2%
7. Manual O autograd: -2%
8. Smart cache evictions and reduced data duplications etc: -30%
9. And other tricks in the Max and Pro versions makes it 30x faster
✔️ Solution
Would be nice to use their kernels to speedup axolotl
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this feature has not been requested yet.
- [X] I have provided enough information for the maintainers to understand and evaluate this request.
If this is a major request by the OSS community - I'm more than happy to include some of the changes from Unsloth!
I would like to second this request. As I understand it, this is simply free increased efficiency for training with no degradation on accuracy, right? I think this would be a major boost to the Axolotl project.
Yes 0% loss in accuracy - we do actual FLOP reductions via our manual autograd engine. I'm actually working with @casper-hansen and some other Axolotl people to put some methods inside Axolotl!
Legend. Superman has posters of you on his wall. Thanks so much for all of your work!
:)
I tried a few of the optimizations for FFT on Mistral, but I cannot seem to improve it according to the posts. @danielhanchen would be great if you can pitch in with a PR if you have time.
https://github.com/OpenAccess-AI-Collective/axolotl/tree/unsloth_modules
@casper-hansen Oh cool - I'll have a look! Ye I'll try to make a PR to axolotl!!
Hi, Is there any status on these updates? If I use Axolotl right now, will I benefit from the Unsloth improvements? Thank you!
@fakerybakery Sorry not yet - I'll take a look at the PR Casper made, but it might take some time
Ok, thank you!
Unsloth is particular interesting if your GPU is not supported by flash attention (e.g., V100). Unfortunately, as of now, unsloth seems to not have multi-GPU support in the OSS version yet: https://github.com/unslothai/unsloth/issues/107
FYI: gradient checkpointing has been merged: https://github.com/OpenAccess-AI-Collective/axolotl/pull/1528 🎉