vits icon indicating copy to clipboard operation
vits copied to clipboard

all loss keeps almost the same during training when using VITS to train multi-lingual datasets

Open zhufeijuanjuan opened this issue 2 years ago • 0 comments

I modified VITS to training multi-lingual voices (english and chinese) by concat a language-specified embedding tensor emb_lang to text embedding emb_t. Everything keeps the same except the hidden channel of text encoder input changes from 192 to 196 (language-specified embedding dim = 4).

All losses in tensorboard except loss/g/fm decreases repidly at the first 1k steps, then keeps almost the same from 1k-60k steps. loss/g/fm keeps increasing.

Anyone have similar issues? Thx.

zhufeijuanjuan avatar Sep 22 '22 08:09 zhufeijuanjuan