AdaSpeech icon indicating copy to clipboard operation
AdaSpeech copied to clipboard

Conditional Layer Normalization

Open Liujingxiu23 opened this issue 4 years ago • 4 comments

Hi, I followed your work for several months and really pleasantly surprised at your speed of tracking the new algorithm. For the Adaspeech, have your verify that the two acoustic encoder really help the training of custom speakers? How it is compared to speaker-embedding generated by speaker-encoder using in speaker verification task? And for the "Conditional Layer Normalization", you have not implement it ,right? Is the following reference suitable if I realize it myself? Or Can you give amy suggest to do this? https://github.com/exe1023/CBLN/blob/e395edc2d6d952497b411f81eae4aafb96749bc2/model/cbn.py https://github.com/CyberZHG/torch-layer-normalization/blob/master/torch_layer_normalization/layer_normalization.py

Liujingxiu23 avatar Mar 17 '21 04:03 Liujingxiu23

In my opinion, utterance level encoder is alternative to an extern speaker encoder model. So if you could use an extern speaker encoder model to extract speaker embedding maybe better.

hoyden avatar Mar 17 '21 12:03 hoyden

@Liujingxiu23 https://github.com/CyberZHG/torch-layer-normalization/blob/master/torch_layer_normalization/layer_normalization.py this works good. Yes speaker embedding generated by speaker encoder using in speaker verification works.

rishikksh20 avatar May 04 '21 23:05 rishikksh20

@rishikksh20 Thank you for your reply. I am trying this and other similar methods to relize personalized-tts that use mobile phone to record audios of users. But the results are not very good, shack and unstabitily are the main problems of synthesized wavs. I am wondering if it is the problem of vocoder, I could not find a universal vocoder using deep learning method.

Liujingxiu23 avatar May 08 '21 07:05 Liujingxiu23

My experiments showd that in a multi-speaker senario the phoneme level mel encoder encodes too much infomation. As a consequence if the phoneme level predictor is not capable enough the performance drops a lot.

MMingabc avatar Mar 03 '22 05:03 MMingabc