Shehper
Results
3
issues of
Shehper
Hi! The batch size of nanoGPT is batch_size*gradient_accumulation_steps = 12*40 = 480. The batch size mentioned in the GPT-2 paper is 512. May I ask why nanoGPT was trained with...
The code, as written, does not create equally distributed classes.
While running inference on my Mac with MacOS version 13.1, I received the following error: ``` RuntimeError: MPS does not support cumsum_out_mps op with int64 input. Support has been added...