Andrej

Results 373 comments of Andrej

actually sorry let me rephrase - new baseline to approach*. The cuDNN dependency is HEAVY (custom install and the compile time becomes 1.5 minutes) and I would delete it if...

not on my mind right now, just trying to get GPT-2 to a good place. won't think about for a few weeks at the very least.

I think this was already merged via previous PRs, closing

my own failed attempt at https://github.com/karpathy/llm.c/issues/246

@joeshmoe0112358 ohhh that makes sense RE: column names 🤦‍♂️ . But ok, I am getting ppl ~21 here, and Alec is citing 37.5 for this model

We are abandoning WikiText103 because it's a total mess. We'll instead look at one/few of ARC Easy / Challenge, Squad, Hellaswag, TriviaQA, LAMBADA. Closing.

(the CI issue is fake, i pushed a fix to master, will try out the speed of this PR tomorrow, assuming it is a bit faster)