minGPT
minGPT copied to clipboard
Zero-grad more aggressively to save memory
Takes a full copy of grad off the peak memory usage.
Numbers based on torch.cuda.max_memory_allocated():
- For
gpt-nano:32019456to31666688 - For
gpt2-xl:30634800640to24607903232(6 gigabytes!)
:O ???