Francesco

Results 87 comments of Francesco

Hi, when producing the mels for WaveRNN (assuming you do want to use the predicted rather than the ground truth ones), you could do a validation step, using the ground...

Hi, did you use a pretrained model? Which version of the repo are you using (which commit)? It might be samples from an older model file than the most recent...

Also, if you're interested in replicating the results using our pretrained models, you can just try the Colab Notebooks.

Sounds fine to me. This is inverted with Griffin-Lim algo, sound quality is expected to be low. You need to follow the next steps in the notebook and convert it...

Hi, is it possible that you trained the autoregressive model up to a reduction factor of more than 1? (in your settings for less than 250K steps)

Hi, concerning the hard-coded parameters: we did not experiment yet with other parameters, as they are quite constant throughout the literature. So unless there is evidence that they are significantly...

Hi, yes you will want to train a forward model for this. There you can easily directly control the duration of each phoneme

Hi @bkumardevan07 if you start with r=1 you most likely will not get the alignment between text and audio. You can observe this in tensorboard in the last layer: if...

What do you mean exactly with aligning the audios? With the script extract_durations.py you will generate a dataset for the forward model using the predictions of the autoregressive model. If...

Hi, to evaluate you autoregressive model FOR the alignment extraction, you have to look at the last layer attention heads of your TRAINING SET. If these do not show significant...