minGPT

minGPT copied to clipboard

Reame
Issues

Zero-grad more aggressively to save memory

Open cchan opened this issue 3 years ago • 1 comments

Takes a full copy of grad off the peak memory usage.

Numbers based on torch.cuda.max_memory_allocated():

For gpt-nano: 32019456 to 31666688
For gpt2-xl: 30634800640 to 24607903232 (6 gigabytes!)

Jan 18 '23 02:01 cchan

:O ???

Jan 18 '23 02:01 karpathy