Sebastian Raschka

Results 821 comments of Sebastian Raschka

I changed the `GaloreArgs` to `OptimizerArgs` and here are some results for phi-2. What's puzzling is the pretraining performance. I couldn't find the issue and may need to investigate more....

I tried many things and even ended up replacing all instances of torch's AdamW with Galore's to make sure it's actually used, but for for some reason, I cannot see...

I changed the hardcoded galore arguments to general `extra_kwargs` so they could be used for other optimizer options as well. This way it adds less clutter to the CLI. So,...

OMG I made it way more complicated than it need be 🤦‍♂️. Thanks for the hint. Now I know.

After trying this, I realize that this may not be cleanly possible because optimizers require `params` as positional argument. So we would have to wrap the optimizer in our own...

Arg, I am still struggling with this. I.e., ``` litgpt finetune full --optimizer.help torch.optim.AdamW ``` works without problem but then even if I don't do anything else, jsonargparse tries to...

This is awesome, Carlos, and it works great! I updated the Readme and added a tutorial. A little note about the structure: As far as I understand, this was requested...

Can be closed in favor of #1299

Thanks for the ping @Dev-Khant & @Andrei-Aksionov , and thanks so much for this valuable contribution. I'll take a look!

Just played around with it for a bit and it works great. Thanks again for this great contrib!