tacorn Framework/glue code plan

I'm currently reworking the general training procedure to more easily enable

[x] use your own dataset (only have to provide a metadata.csv in the correct format)
[x] pick a pretrained model (currently providing automatic download of LJ pretrained models)
[x] have the whole training inside an external experiment folder
[ ] later on swap out the implementations of Tacotron and the neural vocoder

Dec 06 '18 05:12 m-toman

@m-toman Could you tell me the number of generated samples per sec of your wavernn model. In addition, your GPU device.

In my case with my forked branch from your old repository, a V100 machine generate 1200 samples/sec. a k80 machine generate 1000 samples/sec.

Dec 20 '18 07:12 Yeongtae

That is still true, as I haven't really finished any new integration yet. I'm currently integrating this fork of the fork of the fatchord model: https://github.com/geneing/WaveRNN-Pytorch which should be a bit faster when using batch synthesis.

Did you have any luck with WaveGlow?

Dec 20 '18 11:12 m-toman

@m-toman nvidia-tacotron2 and nvidia-waveglow are well optimized. In my experment, a v100 machine can generate 160k samples/sec, 350k samples/sec each.

But waveglow have a problem about reverb. I'm trying to overcome this problem. https://github.com/Yeongtae/tacotron2 https://github.com/Yeongtae/waveglow

Dec 20 '18 11:12 Yeongtae

Impressive, I also wanted to take a look at their repo but can't jump between them all the time ;). I've seen in the WaveGlow issues that the training requires lots of memory to achieve a reasonable batch size.

Dec 20 '18 11:12 m-toman

So I'm using 8 v100 gpus to train waveglow.

Dec 20 '18 11:12 Yeongtae

@m-toman do you make some results with this repository? Could you share a sample audio?

Jan 15 '19 02:01 Yeongtae

@Yeongtae this is the current state for LJ, main annoyance are those clipping issues. Just more training doesn't seem to help. samples.zip This is trained from GTA mel specs with the settings in https://github.com/m-toman/WaveRNN-Pytorch/blob/master/hyperparams.py and https://github.com/m-toman/Tacotron-2/blob/master/hparams.py

Jan 16 '19 19:01 m-toman

@m-toman - you mentioned clipping issues and that's something I'm facing as well. Were you able to track what causes clipping to occur?

May 01 '19 20:05 ZohaibAhmed

@ZohaibAhmed I was able to fix most issues by not using the noam learning rate scheduler but set it to fixed and manually lower it when the loss starts to act funny. I also found that the simple model in https://github.com/h-meru/Tacotron-WaveRNN behaves much more benign and trains nicer (on 10-bit quantization with "bits", "mulaw" also seems to act up) than the alternative WaveRNN model by fatchord.

May 03 '19 06:05 m-toman

@m-toman Thanks. I see that in your referred repo, for Tacotron they use Narrow Exponential Decay, and for WaveRNN he sets the learning rate to a fixed number. Which learning rate were you referring to when you say "manually lowering it".

#################################################################
        # Narrow Exponential Decay:

        # Phase 1: lr = 1e-3
        # We only start learning rate decay after 50k steps

        # Phase 2: lr in ]1e-5, 1e-3[
        # decay reach minimal value at step 310k

        # Phase 3: lr = 1e-5
        # clip by minimal learning rate value (step > 310k)
        #################################################################

May 12 '19 19:05 ZohaibAhmed

WaveRNN - atm I'm starting out with 1e-4 and once the loss starts to act funny, stop it and divide the LR by 10. Currently training with MoL and there it worked well until 400k steps of batch size 128 until I had to lower it. Perhaps "reduce on plateau" or similar would also be a good idea.

May 12 '19 20:05 m-toman

tacorn tacorn copied to clipboard

Framework/glue code plan

tacorn
tacorn copied to clipboard