Markus Toman
Markus Toman
@Rayhane-mamah Thanks, I've read it through now and the rest sounded pretty straightforward. I think your explanation of the CDF thingy is better for someone with less background knowledge, as...
That is still true, as I haven't really finished any new integration yet. I'm currently integrating this fork of the fork of the fatchord model: https://github.com/geneing/WaveRNN-Pytorch which should be a...
Impressive, I also wanted to take a look at their repo but can't jump between them all the time ;). I've seen in the WaveGlow issues that the training requires...
@Yeongtae this is the current state for LJ, main annoyance are those clipping issues. Just more training doesn't seem to help. [samples.zip](https://github.com/m-toman/tacorn/files/2765740/samples.zip) This is trained from GTA mel specs with...
@ZohaibAhmed I was able to fix most issues by not using the noam learning rate scheduler but set it to fixed and manually lower it when the loss starts to...
WaveRNN - atm I'm starting out with 1e-4 and once the loss starts to act funny, stop it and divide the LR by 10. Currently training with MoL and there...
Branched out the alternative model by fatchord into https://github.com/m-toman/tacorn/tree/fatchord_model And started a new branch https://github.com/m-toman/tacorn/tree/wavernn where I added the original wavernn implementation (also by fatchord). Seems it mostly misses the...
@hdmjdp not yet, I'm currently reworking the framework itself (#13) to allow faster experimentation while watching the progress in https://github.com/erogol/WaveRNN and see if I can merge it with the status...
Implemented the model from https://arxiv.org/abs/1811.06292 I'm currently seeing the same issues as with the alternative model when using "bits" input type and 10 bits: training from GTA Mel specs produces...
Training ForwardTacotron on a dataset comprised of multiple male voices as a single speaker dataset?
Haven't tried it but I found that speaker selection isn't random but usually by some similarity to training data sentences. Unfortunately it often overrides the speaker embedding in my case...