axolotl Apply unsloth optimizations

⚠️ Please check that this feature request hasn't been suggested before.

[X] I searched previous Ideas in Discussions didn't find any similar feature requests.
[X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

The project https://github.com/unslothai/unsloth looks very interesting. He claims great speedups for finetuning. He detail the improvements here:

So in GPUs the goal is to saturate the GPU with matrix multiplies instead of data movement. I'll write a more detailed blog but approximately:

1. Flash Attention v2 reduces the time taken by 17% or so

2. RoPE Triton kernels: -7.1%

3. RMS Layernorm in Triton: -3.1%

4. Cross Entropy in Triton: -1%

5. Manual autograd for MLP: -4%

6. Manual QKV autograd: -2%

7. Manual O autograd: -2%

8. Smart cache evictions and reduced data duplications etc: -30%

9. And other tricks in the Max and Pro versions makes it 30x faster

✔️ Solution

Would be nice to use their kernels to speedup axolotl

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this feature has not been requested yet.
[X] I have provided enough information for the maintainers to understand and evaluate this request.

Dec 02 '23 20:12 bratao

If this is a major request by the OSS community - I'm more than happy to include some of the changes from Unsloth!

Dec 05 '23 09:12 danielhanchen

I would like to second this request. As I understand it, this is simply free increased efficiency for training with no degradation on accuracy, right? I think this would be a major boost to the Axolotl project.

Dec 14 '23 03:12 Peter-Devine

Yes 0% loss in accuracy - we do actual FLOP reductions via our manual autograd engine. I'm actually working with @casper-hansen and some other Axolotl people to put some methods inside Axolotl!

Dec 14 '23 03:12 danielhanchen

Legend. Superman has posters of you on his wall. Thanks so much for all of your work!

Dec 14 '23 05:12 Peter-Devine

:)

Dec 14 '23 07:12 danielhanchen

I tried a few of the optimizations for FFT on Mistral, but I cannot seem to improve it according to the posts. @danielhanchen would be great if you can pitch in with a PR if you have time.

https://github.com/OpenAccess-AI-Collective/axolotl/tree/unsloth_modules

Dec 14 '23 08:12 casper-hansen

@casper-hansen Oh cool - I'll have a look! Ye I'll try to make a PR to axolotl!!

Dec 14 '23 11:12 danielhanchen

Hi, Is there any status on these updates? If I use Axolotl right now, will I benefit from the Unsloth improvements? Thank you!

Dec 16 '23 00:12 fakerybakery

@fakerybakery Sorry not yet - I'll take a look at the PR Casper made, but it might take some time

Dec 16 '23 01:12 danielhanchen

Ok, thank you!

Dec 16 '23 02:12 fakerybakery

Unsloth is particular interesting if your GPU is not supported by flash attention (e.g., V100). Unfortunately, as of now, unsloth seems to not have multi-GPU support in the OSS version yet: https://github.com/unslothai/unsloth/issues/107

Apr 18 '24 15:04 kno10

FYI: gradient checkpointing has been merged: https://github.com/OpenAccess-AI-Collective/axolotl/pull/1528 🎉

May 06 '24 10:05 gardner

axolotl axolotl copied to clipboard

Apply unsloth optimizations

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements

axolotl
axolotl copied to clipboard