swift-models icon indicating copy to clipboard operation
swift-models copied to clipboard

GPT-2 currently exhausts all available GPU memory on an 8 GB GPU

Open BradLarson opened this issue 3 years ago • 2 comments

In testing PR #671, we noticed that the GPT-2 model now exhausts all available memory on 8 GB GPUs (example: GTX 1080) for both eager mode and X10 runtimes. It did not do this previously, so at some point the RAM usage of this model has increased to the point where it can no longer train on these GPUs.

We should investigate why this happened and see if memory usage for this model can be brought back down.

BradLarson avatar Sep 29 '20 17:09 BradLarson

As I tested an 16GB GPU VM, I do see that among the last nearly 2/3 of the epochs (10 epochs in total), a peak memory usage of 9187MB happens once in each epoch, and they happen around last training batch.

xihui-wu avatar Oct 05 '20 19:10 xihui-wu

I just verified again on a new 16GB GPU DLVM instance created today, issue sustains.

xihui-wu avatar Nov 18 '20 17:11 xihui-wu