Andrew Lapp
Andrew Lapp
> Looks good, nice catch! > > Can you add a link to this PR as a comment in the function? And can similar stuff happen for `str.lower`? (if not,...
Thanks so much for ensuring there is a proper fix @MegaIng!!
Thanks for directing people to that branch! Still a work in progress, only a subset of the failure cases are handled right now. Happy to hear more json failure reports...
@cpfiffer Outlines' llama.cpp integration doesn't support multiple samples at once. Could you try again with `samples=1` (or just don't explicitly set the sampler) ``` generator = outlines.generate.choice( model, ["H", "T"],...
I tried CautiousMuon (and Cautious Adam) a few days ago. I'm not sure I implemented it correctly, but I saw a decrease in sample efficiency. Use this code with caution,...
I'm also interested in this variant. Considering the long runtime, perhaps it makes sense to compete to minimize validation loss within a 1 hour run?
I achieved < 3.28 in a little under two hours with a few tweaks. @KellerJordan are you interested in hosting a 1x4090 variant of the competition in this repo? If...
Ah, without smaller batch size I get 2 hours 10 minutes. https://gist.github.com/lapp0/2740a03a637ec926cf0eea90e541a0a6 The only changes necessary for 130 minute run which is effectively identical to the 8xH100 are - `batch_size:...
>Hey, can you please point me to whether 2 different runs, one with bsz=8, another with bsz=16 is still comparable in terms of numbers of tokens seen during training, everything...
Perhaps a reasonable alternative is - 1) Expose hyperparameters as command arguments via argparse - 2) Document the command which enables 4090 trains