axolotl
axolotl copied to clipboard
New Optimizer: Implement Adam-Mini optimizer
⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous Ideas in Discussions didn't find any similar feature requests.
- [X] I searched previous Issues didn't find any similar feature requests.
🔖 Feature description
Paper: https://arxiv.org/abs/2406.16793
TL;DR Adam-mini should make it easier and faster to train models on home hardware. In theory, it shouldn't be overly complicated to implement it, as it is very similar to AdamW
✔️ Solution
Implement Adam-Mini in Axolotl.
❓ Alternatives
Keep using AdamW
📝 Additional Context
Adam-mini should probably be 'sort-of' compatible with DeepSpeed right out of the box, greatly increasing training speed and reducing memory footprint.
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this feature has not been requested yet.
- [X] I have provided enough information for the maintainers to understand and evaluate this request.