Andrej
Andrej
actually sorry let me rephrase - new baseline to approach*. The cuDNN dependency is HEAVY (custom install and the compile time becomes 1.5 minutes) and I would delete it if...
Will this code fail for older PyTorch versions?
not on my mind right now, just trying to get GPT-2 to a good place. won't think about for a few weeks at the very least.
I think this was already merged via previous PRs, closing
Got it, ok ty!
my own failed attempt at https://github.com/karpathy/llm.c/issues/246
@joeshmoe0112358 ohhh that makes sense RE: column names 🤦♂️ . But ok, I am getting ppl ~21 here, and Alec is citing 37.5 for this model
We are abandoning WikiText103 because it's a total mess. We'll instead look at one/few of ARC Easy / Challenge, Squad, Hellaswag, TriviaQA, LAMBADA. Closing.
(the CI issue is fake, i pushed a fix to master, will try out the speed of this PR tomorrow, assuming it is a bit faster)
Confirm I also saw ~5% lift on my end, very cool!!