mmagic icon indicating copy to clipboard operation
mmagic copied to clipboard

Memory Leak in basicVSR++ (?)

Open contentis opened this issue 3 years ago • 1 comments

Using distributed training on an 8xV100 machine with all configs from the REDS dataset and setting the samples_per_gpu up to two the training crashed after four hours as it ran out of memory. As you can see in the image is uses around 100GB for most of the time and then slowly creeps up to 420GB until one of the workers crashes due to an out of memory error.

image

contentis avatar Dec 16 '21 07:12 contentis

Hello, I have never encountered this problem, could you tell us more about what command you used for training?

ckkelvinchan avatar Dec 18 '21 02:12 ckkelvinchan

Closing this Issue due to no more feedback. Please feel free to reopen it if needed.

zengyh1900 avatar Oct 09 '22 06:10 zengyh1900