Sebastian Raschka
Sebastian Raschka
I changed the `GaloreArgs` to `OptimizerArgs` and here are some results for phi-2. What's puzzling is the pretraining performance. I couldn't find the issue and may need to investigate more....
I tried many things and even ended up replacing all instances of torch's AdamW with Galore's to make sure it's actually used, but for for some reason, I cannot see...
I changed the hardcoded galore arguments to general `extra_kwargs` so they could be used for other optimizer options as well. This way it adds less clutter to the CLI. So,...
OMG I made it way more complicated than it need be 🤦♂️. Thanks for the hint. Now I know.
After trying this, I realize that this may not be cleanly possible because optimizers require `params` as positional argument. So we would have to wrap the optimizer in our own...
Arg, I am still struggling with this. I.e., ``` litgpt finetune full --optimizer.help torch.optim.AdamW ``` works without problem but then even if I don't do anything else, jsonargparse tries to...
This is awesome, Carlos, and it works great! I updated the Readme and added a tutorial. A little note about the structure: As far as I understand, this was requested...
Can be closed in favor of #1299
Thanks for the ping @Dev-Khant & @Andrei-Aksionov , and thanks so much for this valuable contribution. I'll take a look!
Just played around with it for a bit and it works great. Thanks again for this great contrib!