first-order-model Multiple gpus training

One question: when I ran the training, I always used 1 single GPU because when I tried to use more than one the usage was always at 0%. Does the code work when all the GPUs are set to "EXCLUSIVE" mode?

Nov 30 '20 14:11 alessiapacca

Have you specified device_ids?

Nov 30 '20 14:11 AliaksandrSiarohin

@AliaksandrSiarohin yes, I have always used the command that's on the readme. It may be a problem of the server when I run it, but I just wanted to understand if the code worked when the GPUs where in exclusive mode as that may be an issue with the server.

Nov 30 '20 14:11 alessiapacca

Sorry, I have no idea what thar exclusive mode means. So you may try to see if some simple cifar multi gpu works for you. And if simple cifar with synchronous bn works.

Nov 30 '20 17:11 AliaksandrSiarohin

@alessiapacca may you please share with us on which database you managed to train the network? With Python 3.7.5?

Dec 01 '20 13:12 Mathilda88

@Eliot04 hey I trained with Vox dataset and I used python 3.6.4

Dec 01 '20 13:12 alessiapacca

@alessiapacca Super helpful. Thanks.

Dec 01 '20 15:12 Mathilda88

@alessiapacca I tried to use distributed data parallel to accelerate the training, and it semms to be working. Maybe you can try this too. (but synchronized BatchNorm may have problem when dist data parallel is used, i did not test it)

Jan 13 '21 06:01 SystemErrorWang

@SystemErrorWang How to use distributed data parallel to accelerate the training ？I put the model and datasets to DDP. but it seams to be not working. the GPU usage was always at 0%

Aug 23 '23 02:08 Qia98

@SystemErrorWang How to use distributed data parallel to accelerate the training ？I put the model and datasets to DDP. but it seams to be not working. the GPU usage was always at 0%

I modified the code with this repo: https://github.com/rosinality/stylegan2-pytorch adopted the ddp part of the stylegan2 code, combined with the First-Order Motion Model training code It would spend some time to read the code, but unfortunately my previous code is missing because I changed my job now I believe it's practical and not difficult, wish you good luck!

Aug 23 '23 05:08 SystemErrorWang

first-order-model first-order-model copied to clipboard

Multiple gpus training

first-order-model
first-order-model copied to clipboard