Adriana STAN

Results 18 comments of Adriana STAN

Hi, Thanks for your reply. I indeed started training a 1-flow using the LibriSpeech train-clean-100 data using a modified unconditioned [version of Flowtron](https://gitlab.utcluj.ro/sadriana/flowtron-librispeech). I then used the trained flow to...

I warmstarted a 2 flow model from the 1 flow weights and continued training. Training and validation losses are as below: ![2flows](https://user-images.githubusercontent.com/6659449/89275929-a0fc4500-d64b-11ea-8377-fa6d3f922d15.png) ![2flows_sid0_sigma0 5](https://user-images.githubusercontent.com/6659449/89275966-afe2f780-d64b-11ea-86d3-89dff3cf0dc3.png) Still no speech-like output at inference....

I did not use speaker embeddings, just a multispeaker dataset. I removed all conditionings of the flow.

Your metadata file does not contain the speaker ID field (3rd one, i.e. x[2]), or there are empty lines in your metadata file.

> Make sure you trim silences from the beginning and end of your audio files Should there be no silence at all in the beginning and end, or should there...

> I use LJSpeech dataset for training. Any instructions on how to trim them? The simplest way would be to use [librosa.effects.trim()](https://librosa.org/librosa/generated/librosa.effects.trim.html?highlight=librosa%20effects%20trim)

Hi, thank you for your reply. We also trained a model with the exact same number of utterances and the same text from 37 different speakers, and the results are...

I now added the embedding to condition the decoder as well here: https://github.com/NVIDIA/DeepLearningExamples/blob/de507d9fecfbdd50ad001bdb15e89f8eae46871e/PyTorch/SpeechSynthesis/FastPitch/fastpitch/model.py#L314 . But the results aren't any better. We use both male and female speakers, ranging from 137...

We also used external speaker embeddings (derived from SpeechBrain's model: https://speechbrain.github.io/) as opposed to having FastPitch learn them. This helped a bit, but it still fails at generating short utterances...