minGPT icon indicating copy to clipboard operation
minGPT copied to clipboard

Zero-grad more aggressively to save memory

Open cchan opened this issue 3 years ago • 1 comments

Takes a full copy of grad off the peak memory usage.

Numbers based on torch.cuda.max_memory_allocated():

  • For gpt-nano: 32019456 to 31666688
  • For gpt2-xl: 30634800640 to 24607903232 (6 gigabytes!)

cchan avatar Jan 18 '23 02:01 cchan

:O ???

karpathy avatar Jan 18 '23 02:01 karpathy