sd-scripts
sd-scripts copied to clipboard
Support GaLore Optimazer
The optimizer is memory efficient. We can pretrain mistral-7B with 24GB.
https://github.com/jiaweizzhao/GaLore
Sorry. It does not have feasibly. It takes many days.
Well, this thing is a bit tricky to implement, and you need to get the weights for each layer individually, instead of just adding an optimizer