CVPR2022-DaGAN There seems to be a problem with distributed running codeWhen I entered the training command, the program did not respond

There seems to be a problem with distributed running codeWhen I entered the training command, the program did not respond

Open pfeducode opened this issue 2 years ago • 7 comments

When I entered the training command, the program did not respond. How can I solve it

Aug 28 '22 12:08 pfeducode

Hi, @pfeducode! Did you solve this problem?

I also met this problem, but I couldn't solve this problem.

If you solve this problem, will you share the any idea?

Sep 21 '22 14:09 samsara-ku

I deleted the distributed code, and then it can run normally。

Sep 23 '22 02:09 pfeducode

There seems to be a problem with distributed running code. When I entered the training command, the program did not respond, and I had to delete the distributed code

Nov 05 '22 11:11 pfeducode

Please use this command line: CUDA_VISIBLE_DEVICES=0,1,2,3 python run_dataparallel.py --config config/vox-adv-256.yaml --device_ids 0,1,2,3 --name DaGAN_voxceleb2_depth --rgbd --batchsize 48 --kp_num 15 --generator DepthAwareGenerator

Nov 05 '22 12:11 harlanhong

Actually, I also met this problem when I was using another version pytorch. It seems to only work for "1.9.0+cu111".

Nov 05 '22 12:11 harlanhong

Actually, I also met this problem when I was using another version pytorch. It seems to only work for "1.9.0+cu111".

Okay, I'll try later

Nov 09 '22 09:11 pfeducode

After removing the distributed code for the generator and discriminator and making device changes in the "model_dataparallel.py" file, I have successfully got it working on a single GPU.

Jun 10 '23 05:06 VedantDere0104

CVPR2022-DaGAN CVPR2022-DaGAN copied to clipboard

There seems to be a problem with distributed running codeWhen I entered the training command, the program did not respond

CVPR2022-DaGAN
CVPR2022-DaGAN copied to clipboard