Andrew Lapp comments

Results 205 comments of


                                            Andrew Lapp

ensure upper() doesn't increase string length

> Looks good, nice catch! > > Can you add a link to this PR as a comment in the function? And can similar stuff happen for `str.lower`? (if not,...

ensure upper() doesn't increase string length

Thanks so much for ensuring there is a proper fix @MegaIng!!

llama_cpp - JSON fails to generate when using Pydantic model with models.llama_cpp

Thanks for directing people to that branch! Still a work in progress, only a subset of the failure cases are handled right now. Happy to hear more json failure reports...

llama_cpp - JSON fails to generate when using Pydantic model with models.llama_cpp

@cpfiffer Outlines' llama.cpp integration doesn't support multiple samples at once. Could you try again with `samples=1` (or just don't explicitly set the sampler) ``` generator = outlines.generate.choice( model, ["H", "T"],...

Has anyone tried the `Cautious Optimizer`?

I tried CautiousMuon (and Cautious Adam) a few days ago. I'm not sure I implemented it correctly, but I saw a decrease in sample efficiency. Use this code with caution,...

A speedrun on consumer grade cards?

I'm also interested in this variant. Considering the long runtime, perhaps it makes sense to compete to minimize validation loss within a 1 hour run?

A speedrun on consumer grade cards?

I achieved < 3.28 in a little under two hours with a few tweaks. @KellerJordan are you interested in hosting a 1x4090 variant of the competition in this repo? If...

A speedrun on consumer grade cards?

Ah, without smaller batch size I get 2 hours 10 minutes. https://gist.github.com/lapp0/2740a03a637ec926cf0eea90e541a0a6 The only changes necessary for 130 minute run which is effectively identical to the 8xH100 are - `batch_size:...

A speedrun on consumer grade cards?

>Hey, can you please point me to whether 2 different runs, one with bsz=8, another with bsz=16 is still comparable in terms of numbers of tokens seen during training, everything...

Run on RTX 4090

Perhaps a reasonable alternative is - 1) Expose hyperparameters as command arguments via argparse - 2) Document the command which enables 4090 trains