vq-vae-2-pytorch icon indicating copy to clipboard operation
vq-vae-2-pytorch copied to clipboard

how to distributed train?

Open Dududu233 opened this issue 4 years ago • 5 comments
trafficstars

I have tried run 'python tain_vqvae.py --path '\home\lab\ffhq_dataset' 'in terminal, but there is a error 'module 'torch.distributed' has no ttributed 'launch' '. I read some other distributed training examples, and I didn't find such a usage for distributed:"dist.launch(main, args.n_gpu, 1, 0, args.dist_url, args=(args,))" .They just run 'python -m distributed.launch script.py 'in terminal. What's the wrong and how can I fix it? Looking forward to your response.

Dududu233 avatar Nov 10 '21 10:11 Dududu233

By the way, I use python3.7 pytorch1.1.0 and cuda 9.0.

Dududu233 avatar Nov 11 '21 01:11 Dududu233

I found more functions not in module'distributed', such as dist.is_primary. Is this a function written by yourself? What's the porpose of these functions?

Dududu233 avatar Nov 12 '21 07:11 Dududu233

It is in the https://github.com/rosinality/vq-vae-2-pytorch/tree/master/distributed. I don't know why torch.distributed is used, instead of this.

rosinality avatar Nov 13 '21 07:11 rosinality

Thank you for your response. The problem is solved.

Dududu233 avatar Nov 15 '21 02:11 Dududu233

@Dududu233 , how did you solve the problem?

berryweinst avatar Dec 16 '21 11:12 berryweinst