nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

Add gradient accumulation support

Open VHellendoorn opened this issue 1 year ago • 2 comments

Enables training with larger effective batch sizes by taking multiple steps between gradient updates. I've always found this useful since batch size correlates strongly with performance even for small models (per Kaplan'20). Feel free to decline, of course.

VHellendoorn avatar Jan 09 '23 20:01 VHellendoorn