nonparaSeq2seqVC_code icon indicating copy to clipboard operation
nonparaSeq2seqVC_code copied to clipboard

Multi-GPU training

Open ivancarapinha opened this issue 4 years ago • 3 comments

Hello, Could you please specify the steps to enable multi-GPU training, please? I set distributed_run=True in hparams.py and then set --n_gpus=2 and CUDA_VISIBLE_DEVICES=0,3 in file run.sh to select GPUs 0 and 3, respectively. I did this and the code seems to enter some kind of deadlock because it does not start training. Thank you.

ivancarapinha avatar May 25 '20 18:05 ivancarapinha

The use of multi-GPU training is basically the same as in https://github.com/NVIDIA/tacotron2. First create a directory named "logs", then run python -m multiproc train.py --output_directory=outdir --log_directory=logdir --n_gpus=2 --hparams=distributed_run=True

jxzhanggg avatar May 26 '20 10:05 jxzhanggg

Thanks your impressive work.

when I use multi-GPU training, such as python -m multiproc train.py --output_directory=outdir --log_directory=logdir --n_gpus=2 --hparams=distributed_run=True

I run into the error, as shown below:

Traceback (most recent call last): File "train.py", line 369, in args.warm_start, args.n_gpus, args.rank, args.group_name, hparams) File "train.py", line 234, in train train_loader, valset, collate_fn = prepare_dataloaders(hparams) File "train.py", line 64, in prepare_dataloaders drop_last=True, collate_fn=collate_fn) File "/home/test/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 189, in init raise ValueError('sampler option is mutually exclusive with ' ValueError: sampler option is mutually exclusive with shuffle

agangzz avatar Jun 08 '20 11:06 agangzz

Hi, as the error message says, when using multi-GPU training, you need to set up shuffle=False in dataloader.

jxzhanggg avatar Jun 10 '20 20:06 jxzhanggg