adaptive_voice_conversion
adaptive_voice_conversion copied to clipboard
About the implementation in the code
Hello, I have two questions while reading your code. Could you please help me answer them if you are free.
-
Why is the forward propagation of training and prediction different? ` def forward(self, x):
emb = self.speaker_encoder(x)
mu, log_sigma = self.content_encoder(x) eps = log_sigma.new(*log_sigma.size()).normal(0, 1)_ dec = self.decoder(mu + torch.exp(log_sigma / 2) * eps, emb) return mu, log_sigma, emb, decdef inference(self, x, x_cond): emb = self.speaker_encoder(x_cond) mu, _ = self.content_encoder(x) dec = self.decoder(mu, emb) return dec `
-
How is the loss function of KL divergence calculated? loss_kl = 0.5 * torch.mean(torch.exp(log_sigma) + mu ** 2 - 1 - log_sigma) https://github.com/jjery2243542/adaptive_voice_conversion/blob/68c33518495d7de404a0f1fdce95e718db86c91b/solver.py#L86
hi, @wyp19930313 I have the same questions as both of your questions. Have you found the answers?
@sbkim052 The first question is determined by the principle of VAE. The second question is still not understood.