pytorch-optimizer Apollo optimizer eats all the GPU memory

Apollo optimizer eats all the GPU memory

Open EmilPi opened this issue 4 years ago • 2 comments

My network is: a few dense layers (conv with padding + concatenating output to input), 2-layer LSTM and 2 Linear layers in the end. Even after I made a network laughingly small, all GPU memory (8 GB) was consumed in a few epochs.

I understand that Apollo optimizer is quasi-Newton and attempts to approximate second derivative, but still - why memory consumption grows with every epoch? I tried putting torch.cuda.empty_cache(), torch.clear_autocast_cache() (I didn't understand this, but who knows), gc.collect() - after each call consumption dropped a bit, but not so fast as Apollo took it :)

Apr 24 '21 19:04 EmilPi

I ran into this problem when I had set weight_decay > 0. Once I removed it memory usage was constant.

Jun 08 '21 02:06 mlw214

Same here

Sep 11 '21 12:09 matthewdm0816

pytorch-optimizer pytorch-optimizer copied to clipboard

Apollo optimizer eats all the GPU memory

pytorch-optimizer
pytorch-optimizer copied to clipboard