nanoGPT Add gradient accumulation support

Add gradient accumulation support

Open VHellendoorn opened this issue 1 year ago • 2 comments

Enables training with larger effective batch sizes by taking multiple steps between gradient updates. I've always found this useful since batch size correlates strongly with performance even for small models (per Kaplan'20). Feel free to decline, of course.

Jan 09 '23 20:01 VHellendoorn

nanoGPT nanoGPT copied to clipboard

Add gradient accumulation support

nanoGPT
nanoGPT copied to clipboard