Adriana STAN
Adriana STAN
Hi, Thanks for your reply. I indeed started training a 1-flow using the LibriSpeech train-clean-100 data using a modified unconditioned [version of Flowtron](https://gitlab.utcluj.ro/sadriana/flowtron-librispeech). I then used the trained flow to...
I warmstarted a 2 flow model from the 1 flow weights and continued training. Training and validation losses are as below:   Still no speech-like output at inference....
I did not use speaker embeddings, just a multispeaker dataset. I removed all conditionings of the flow.
Your metadata file does not contain the speaker ID field (3rd one, i.e. x[2]), or there are empty lines in your metadata file.
> Make sure you trim silences from the beginning and end of your audio files Should there be no silence at all in the beginning and end, or should there...
> I use LJSpeech dataset for training. Any instructions on how to trim them? The simplest way would be to use [librosa.effects.trim()](https://librosa.org/librosa/generated/librosa.effects.trim.html?highlight=librosa%20effects%20trim)
Hi, thank you for your reply. We also trained a model with the exact same number of utterances and the same text from 37 different speakers, and the results are...
I now added the embedding to condition the decoder as well here: https://github.com/NVIDIA/DeepLearningExamples/blob/de507d9fecfbdd50ad001bdb15e89f8eae46871e/PyTorch/SpeechSynthesis/FastPitch/fastpitch/model.py#L314 . But the results aren't any better. We use both male and female speakers, ranging from 137...
We also used external speaker embeddings (derived from SpeechBrain's model: https://speechbrain.github.io/) as opposed to having FastPitch learn them. This helped a bit, but it still fails at generating short utterances...