Tacotron-2 icon indicating copy to clipboard operation
Tacotron-2 copied to clipboard

Multi-GPUS support

Open MlWoo opened this issue 6 years ago • 7 comments

Many friends seem very to be interested in multi-gpus support when training the model. Maybe it is necessary to merge the branch into the master one.

MlWoo avatar Aug 14 '18 02:08 MlWoo

@begeekmyfriend I have not modified the relative code in terms of the pattern.

MlWoo avatar Aug 14 '18 03:08 MlWoo

@Rayhane-mamah Yes I agree. In multi-gpu mode we can set r=1 and expand the batch size to obtain smooth gradient. So please consider it as another branch.

begeekmyfriend avatar Aug 14 '18 05:08 begeekmyfriend

Yes it seems like people are requesting that. :) well, your multi-gpu attempt @MlWoo is sure much helpful. Since the model content has been changed since you made this implementation, I will need to make few updates here and there, but yeah, I will probably make a new branch for both Wavenet and Tacotron multi-gpu or add those directly on master with optional use or something. (I don't like 4 spaces though hahaha..).

In the meantime, I am leaving this PR open in here so that people can quickly refer to a good multi-gpu implementation :)

Thanks for all your contributions @MlWoo and @begeekmyfriend ;)

Rayhane-mamah avatar Aug 14 '18 08:08 Rayhane-mamah

When I try to use this Fork as it is, I run into the following:

ValueError: Cannot feed value of shape (48, 408, 1025) for Tensor 'datafeeder/linear_targets:0', which has shape '(?, ?, 513)'

What could be the cause of this? I preprocessed LJSpeech with the given hyperparameters btw.

tomse-h avatar Sep 17 '18 08:09 tomse-h

@tomse-h I have not modified the relative code in terms of the linear pattern. You can complete it with the solution of mel features

MlWoo avatar Sep 18 '18 02:09 MlWoo

I might be a bit late into this conversation, but did you guys also see a proportional increase in sec/step when using multiple GPUs? Here are my stats on V100 GPUs with outputs_per_step = 16 #GPU----batchsize----sec/step 1.................32......................~4 2.................64.....................~10 3.................96 ....................~15 4.................128....................~19

shaktikshri avatar Aug 14 '20 14:08 shaktikshri

@shaktikshri No, it increases but does scale linearly. You would better check the time of loading data and the unbalance of length of data of each device.

MlWoo avatar Aug 17 '20 09:08 MlWoo