nanoGPT
nanoGPT copied to clipboard
what is the main speed up trick for nanoGPT?
question
curiosity, does someone know what the main trick to having nano GPT2 train so quickly? (from the 1 week I was used to to 1day it seems) https://github.com/karpathy/nanoGPT After a discussion it seems its only a smaller batch size...then it seems to achieve the same val loss quicker due to this. Is this the main trick really? Also doesn't a smaller batch size give us more uncertainty on the loss function? How do we know the two model truly perform the same? e.g. see confidence intervals on means depends on sqrt of N.
cross: https://www.quora.com/unanswered/What-is-the-main-speed-up-trick-s-for-NanoGPT-from-Andrej-Karpathy cross2: https://www.reddit.com/r/learnmachinelearning/comments/10w84m4/what_is_the_main_speedup_tricks_for_nanogpt_from/
related3: https://ai.stackexchange.com/questions/39186/why-do-llms-need-massive-distributed-training-across-nodes-if-the-models-fit