Adriana STAN comments

Results 18 comments of


                                            Adriana STAN

Unconditioned Flowtron

Hi, Thanks for your reply. I indeed started training a 1-flow using the LibriSpeech train-clean-100 data using a modified unconditioned [version of Flowtron](https://gitlab.utcluj.ro/sadriana/flowtron-librispeech). I then used the trained flow to...

Unconditioned Flowtron

I warmstarted a 2 flow model from the 1 flow weights and continued training. Training and validation losses are as below: ![2flows](https://user-images.githubusercontent.com/6659449/89275929-a0fc4500-d64b-11ea-8377-fa6d3f922d15.png) ![2flows_sid0_sigma0 5](https://user-images.githubusercontent.com/6659449/89275966-afe2f780-d64b-11ea-86d3-89dff3cf0dc3.png) Still no speech-like output at inference....

Unconditioned Flowtron

I did not use speaker embeddings, just a multispeaker dataset. I removed all conditionings of the flow.

WARNING:root:NaN or Inf found in input tensor.

#12

Issue starting training while using ljs dataset

Your metadata file does not contain the speaker ID field (3rd one, i.e. x[2]), or there are empty lines in your metadata file.

Bad attention weights

> Make sure you trim silences from the beginning and end of your audio files Should there be no silence at all in the beginning and end, or should there...

Bad attention weights

> I use LJSpeech dataset for training. Any instructions on how to trim them? The simplest way would be to use [librosa.effects.trim()](https://librosa.org/librosa/generated/librosa.effects.trim.html?highlight=librosa%20effects%20trim)

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

Hi, thank you for your reply. We also trained a model with the exact same number of utterances and the same text from 37 different speakers, and the results are...

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

I now added the embedding to condition the decoder as well here: https://github.com/NVIDIA/DeepLearningExamples/blob/de507d9fecfbdd50ad001bdb15e89f8eae46871e/PyTorch/SpeechSynthesis/FastPitch/fastpitch/model.py#L314 . But the results aren't any better. We use both male and female speakers, ranging from 137...

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

We also used external speaker embeddings (derived from SpeechBrain's model: https://speechbrain.github.io/) as opposed to having FastPitch learn them. This helped a bit, but it still fails at generating short utterances...