vits_chinese icon indicating copy to clipboard operation
vits_chinese copied to clipboard

Some questions for the KL_loss and KL_loss_r and model behaviors

Open feng-yufei opened this issue 1 year ago • 13 comments

Hello MaxMax2016,

Thank you for sharing your code on the improved VITS, I hope to check with you about the model behaviors when adding the bi-directional KL divergence. In this repo I found that you use 1.0 * kl_loss + 1.0 * kl_loss_r during training, and when I do training on my own dataset I saw this greatly increased the mel-spectrogram loss, comparing to the original VITS, and when I add kl_loss_r to finetune a well trained VITS model, it gives bad voice quality. Can you share your experiences or findings when trying to add this specific term? In the loss curve you shared, it seems that the mel-spec loss is still low (below 20), which is very interesting.

Thanks

feng-yufei avatar Mar 08 '23 21:03 feng-yufei