nanoGPT
nanoGPT copied to clipboard
Just a question
If I understand correctly, you have max 600000 iterations times batches of 12, which is roughly 7M training examples fed to the transformer, way smaller than the 9B tokens of the training set. I certainly am missing something? Thanks
each batch has 12 * 1024 tokens, because 1024 is the block size. All of those tokens get trained on in parallel.
good thankyou