Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard
Question about KL Divergence loss function 关于KL散度的损失函数的问题
English and Chinese are translations to each other.
This is loss function for KL Divergence. Specifically, for these formulas:
以下是KL散度的损失函数,特别是这段公式:
they are acutally calculating kl divergence for gaussian distribution
他们其实是计算高斯分布的KL散度
But I find out it omits one term, which is $σ_1^2$
但是我发现他漏了一项$σ_1^2$
I don't know why it didn't write this term, if we include this term, the fomula should looked like this:
我不知道为什么他没写这一项,如果加上了这一项,这个公式应该长下面这样:
Sincerey hoping for Answer.
You can also post this issue to https://github.com/jaywalnut310/vits
You can also post this issue to https://github.com/jaywalnut310/vits
Yeah, but I think there is no contributor maintaining that repo, so I want to try if some repos based on vits can answer it.
Very curious too. At first glance I thought about optimizations given the KL characteristics, but after scrambling over some papers about KL optimization and approximation I didn't find anything describing this case, or at least nothing that caught my attention. Has anyone tried inserting the missing variance and comparing performance?
Very curious too. At first glance I thought about optimizations given the KL characteristics, but after scrambling over some papers about KL optimization and approximation I didn't find anything describing this case, or at least nothing that caught my attention. Has anyone tried inserting the missing variance and comparing performance?
I tried, and the performance is about the same. But the pretrained model must be trained on the version without that term. So I want to try training from the beginning, but I think it takes really long time.
Very curious too. At first glance I thought about optimizations given the KL characteristics, but after scrambling over some papers about KL optimization and approximation I didn't find anything describing this case, or at least nothing that caught my attention. Has anyone tried inserting the missing variance and comparing performance?
In addition, it calculate KL divergence by z_p, m_p, logs_q, logs_p, but z_p is the output of flow. In calculating KL divergence, it should be the mean of a distribution, but z_p definitely not a mean calculated by posterior encoder. (The mean of posterior encoder should be m_q, but not z_p)
Hi @JunityZhan, sorry for the ping. Did you learn something new regarding this issue from the other VITS based repos?
Hi @JunityZhan, sorry for the ping. Did you learn something new regarding this issue from the other VITS based repos?
Still no, I didn't find any related information and I think it is better to ask a professor but I can't because I am not a university student yet.😢
I think the KL divergence may be with respect to a standard Gaussian so s_q is just 1?
I have found this topic, where author discusses KL divergence in detail: Please refer: #6
The KL divergence that author uses is correct. If you want to know more about it, please tag me or ping me on discord (p0p4k). I can explain it for anyone who is interested.
This issue was closed because it has been inactive for 15 days since being marked as stale.