co-mod-gan
co-mod-gan copied to clipboard
The training process always be killed
Hi,
Thank you for your great job, It's amazing. However, when I using co-mod-gan to train ffhq dataset by myself, the process always be killed, my device contain 4 1080ti gpus and each one has 12GB gpu memory, Ram memory is 32GB.
When I use resolution 512x512 ffhq tfrecord dataset to train the model, it shows killed. Could you tell me how much memory do you use? And what should I do? Thank you so much.
Best regards
Preferably 8 GPUs with 16 GB (maybe 12 GB is OK) memory on each. Otherwise you have to reduce the batch size / network capacity.
Preferably 8 GPUs with 16 GB (maybe 12 GB is OK) memory on each. Otherwise you have to reduce the batch size / network capacity.
Thank you for your kind reply. I try to use batch size as one for each GPU, the program still be killed. Maybe the main problem is the RAM OOM, not the GPU OOM.
I have no idea what causes RAM OOM :(
How come you guys are training :(