Rafael Valle comments

Results 133 comments of


                                            Rafael Valle

Bad attention weights

@kurbob does the attention map always look like that? You might have to change from byte to bool https://github.com/NVIDIA/flowtron/blob/master/flowtron.py#L33

Bad attention weights

@adrianastan There should be **no silence** at the beginning or at the end of an audio file.

Bad attention weights

@Liujingxiu23 please share training, validations losses and attention for 1 step of flow model and 2 steps of flow model.

Bad attention weights

Did you warmup the 2 flows model with the 1 flow model from a checkpoint around 200k?

Batch size?

Yes! It's worth trying and can lead to lower validation losses. Let us know what your findings are.

Batch size?

The two samples are different, hence the alignment plots are different.

Problems with flowtron_libritts2p3k

take a look at the keys for `flowtron_libritts2p3k`. i think we saved the model instead of just the state dict. the code below should solve your issue. ``` ck =...

amount of data for single speaker

yes, it is possible to get decent results with the amount of data you have. the closer your speaker is to existing speakers in flowtron_libritts2k the better it will sound....

amount of data for single speaker

@shehrum you need to first train with the attention prior enabled and then disable it and resume training once the attention looks good.

detailed work pipeline to train a multi-speaker flowtron model

We provide a checkpoint for libritts with over 2k speakers. Turn the attention prior to True before training. After training for some time,set it to false once the model has...