nanoGPT what is the main speed up trick for nanoGPT?

what is the main speed up trick for nanoGPT?

Open brando90 opened this issue 1 year ago • 3 comments

question

curiosity, does someone know what the main trick to having nano GPT2 train so quickly? (from the 1 week I was used to to 1day it seems) https://github.com/karpathy/nanoGPT After a discussion it seems its only a smaller batch size...then it seems to achieve the same val loss quicker due to this. Is this the main trick really? Also doesn't a smaller batch size give us more uncertainty on the loss function? How do we know the two model truly perform the same? e.g. see confidence intervals on means depends on sqrt of N.

cross: https://www.quora.com/unanswered/What-is-the-main-speed-up-trick-s-for-NanoGPT-from-Andrej-Karpathy cross2: https://www.reddit.com/r/learnmachinelearning/comments/10w84m4/what_is_the_main_speedup_tricks_for_nanogpt_from/

related3: https://ai.stackexchange.com/questions/39186/why-do-llms-need-massive-distributed-training-across-nodes-if-the-models-fit

Jan 16 '23 23:01 brando90

nanoGPT nanoGPT copied to clipboard

what is the main speed up trick for nanoGPT?

nanoGPT
nanoGPT copied to clipboard