co-mod-gan The training process always be killed

The training process always be killed

Open SwordBearFire opened this issue 3 years ago • 4 comments

Hi,

Thank you for your great job, It's amazing. However, when I using co-mod-gan to train ffhq dataset by myself, the process always be killed, my device contain 4 1080ti gpus and each one has 12GB gpu memory, Ram memory is 32GB.

When I use resolution 512x512 ffhq tfrecord dataset to train the model, it shows killed. Could you tell me how much memory do you use? And what should I do? Thank you so much.

Best regards

May 18 '21 08:05 SwordBearFire

Preferably 8 GPUs with 16 GB (maybe 12 GB is OK) memory on each. Otherwise you have to reduce the batch size / network capacity.

May 19 '21 03:05 zsyzzsoft

Preferably 8 GPUs with 16 GB (maybe 12 GB is OK) memory on each. Otherwise you have to reduce the batch size / network capacity.

Thank you for your kind reply. I try to use batch size as one for each GPU, the program still be killed. Maybe the main problem is the RAM OOM, not the GPU OOM.

May 19 '21 03:05 MingtaoGuo

I have no idea what causes RAM OOM :(

May 19 '21 17:05 zsyzzsoft

How come you guys are training :(

Jun 11 '21 18:06 tiwarikaran

co-mod-gan co-mod-gan copied to clipboard

The training process always be killed

co-mod-gan
co-mod-gan copied to clipboard