bitsandbytes
bitsandbytes copied to clipboard
Initial kernel changes to support GaLore
This is a draft containing some of the initial changes to support GaLore. So far this covers 2-state optimizers.
Optimizer2State.update_step() now contains an additional argument return_updates. When provided a tensor to hold the updates, they're returned here and p is not changed. Additionally, no weight decay is applied.
Needs tests, feedback welcome.
cc: @TimDettmers @jiaweizzhao @Titus-von-Koeller
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
@matthewdouglas Tim said he could review your work this weekend.
Updated with changes added for 1-state optimizers (Momentum, RMSProp, Adagrad, Lion).