nanoGPT
nanoGPT copied to clipboard
Add gradient accumulation support
Enables training with larger effective batch sizes by taking multiple steps between gradient updates. I've always found this useful since batch size correlates strongly with performance even for small models (per Kaplan'20). Feel free to decline, of course.