bitsandbytes
bitsandbytes copied to clipboard
Request to add 4-bit AdamW and 4-bit SGD
Paper and Code
Paper: Memory Efficient Optimizers with 4-bit States Code :
- https://github.com/thu-ml/low-bit-optimizers/blob/main/lpmm/optim/optimizer.py
- https://github.com/thu-ml/low-bit-optimizers/blob/main/lpmm/optim/adamw.py
- https://github.com/thu-ml/low-bit-optimizers/blob/main/lpmm/optim/sgd.py
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
raise
Raise,
edit: I am currently no where near good enough at programming to do this, but it would be pretty cool to use some of the bnb paged 8bit adamw code, and 4bit adamw code and make a 4bit paged adamw. It would lower training requirements even lower gb cards, than a current implementation.
It could be possible to full fine tune a 7b model with 4bit optimizer, with a 24 gig card. With gradient accumulation, Based off this chart.
If there happens to be a branch, or PR for this, I’d love to see it! Could you share a link?