Retrieval-based-Voice-Conversion-WebUI Question about KL Divergence loss function 关于KL散度的损失函数的问题

English and Chinese are translations to each other.

2306be521dbeb70d2abf02686ca3ac38 This is loss function for KL Divergence. Specifically, for these formulas: 以下是KL散度的损失函数，特别是这段公式： 579501e7e2a721be0c075efab22f2a4e they are acutally calculating kl divergence for gaussian distribution 他们其实是计算高斯分布的KL散度 37b1f2ca39d43578c3ef97dcb429a064 But I find out it omits one term, which is $σ_1^2$ 但是我发现他漏了一项$σ_1^2$ I don't know why it didn't write this term, if we include this term, the fomula should looked like this: 我不知道为什么他没写这一项，如果加上了这一项，这个公式应该长下面这样： 08dde34439df32c28b3e25cc571340bb

Sincerey hoping for Answer.

Jun 07 '23 17:06 JunityZhan

You can also post this issue to https://github.com/jaywalnut310/vits

Jun 08 '23 03:06 RVC-Boss

You can also post this issue to https://github.com/jaywalnut310/vits

Yeah, but I think there is no contributor maintaining that repo, so I want to try if some repos based on vits can answer it.

Jun 08 '23 06:06 JunityZhan

Very curious too. At first glance I thought about optimizations given the KL characteristics, but after scrambling over some papers about KL optimization and approximation I didn't find anything describing this case, or at least nothing that caught my attention. Has anyone tried inserting the missing variance and comparing performance?

Jun 11 '23 00:06 RainaObi

Very curious too. At first glance I thought about optimizations given the KL characteristics, but after scrambling over some papers about KL optimization and approximation I didn't find anything describing this case, or at least nothing that caught my attention. Has anyone tried inserting the missing variance and comparing performance?

I tried, and the performance is about the same. But the pretrained model must be trained on the version without that term. So I want to try training from the beginning, but I think it takes really long time.

Jun 12 '23 12:06 JunityZhan

Very curious too. At first glance I thought about optimizations given the KL characteristics, but after scrambling over some papers about KL optimization and approximation I didn't find anything describing this case, or at least nothing that caught my attention. Has anyone tried inserting the missing variance and comparing performance?

In addition, it calculate KL divergence by z_p, m_p, logs_q, logs_p, but z_p is the output of flow. In calculating KL divergence, it should be the mean of a distribution, but z_p definitely not a mean calculated by posterior encoder. (The mean of posterior encoder should be m_q, but not z_p)

Jun 12 '23 12:06 JunityZhan

Hi @JunityZhan, sorry for the ping. Did you learn something new regarding this issue from the other VITS based repos?

Jun 19 '23 03:06 RainaObi

Hi @JunityZhan, sorry for the ping. Did you learn something new regarding this issue from the other VITS based repos?

Still no, I didn't find any related information and I think it is better to ask a professor but I can't because I am not a university student yet.😢

Jun 19 '23 11:06 JunityZhan

I think the KL divergence may be with respect to a standard Gaussian so s_q is just 1?

Jul 31 '23 19:07 sjkoelle

I have found this topic, where author discusses KL divergence in detail: Please refer: #6

Aug 06 '23 07:08 daniilrobnikov

The KL divergence that author uses is correct. If you want to know more about it, please tag me or ping me on discord (p0p4k). I can explain it for anyone who is interested.

Dec 27 '23 07:12 p0p4k

This issue was closed because it has been inactive for 15 days since being marked as stale.

Apr 28 '24 04:04 github-actions[bot]