multi-level-vae
multi-level-vae copied to clipboard
Replication of results
HI Ananya,
I am the first author of the ML-VAE paper, thank you for implementing it.
I am surprised that you did not succeed to reproduce the results, as we show that even without accumulating evidence at test time the swapping experiments worked on MNIST. Do you use the mean value of the encoding, or a sample ? Moreover, we experimented with both sampling one content for all x in the group or one per x and both worked (see footnote 1 in https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/16521/15918)
I had a look at your code, from what I understand you divide both the KL on the style and on the content by batch_size * image_size * image_size * num_channels ? That is surprising, as the KL divergence is computed on the latent code distribution. We actually divide the sum of group ELBO by the number of groups (see Eq. 5 in the paper), not the batch size. Is it an important parameter which clearly differ from classic VAE.
Hello @DianeBouchacourt , I'm also interested in using this implementation but on a different type of data than images. Based on your comment above, is it correct to assume that the trainer code should be modified to divide the divergence losses by the number of groups in the minibatch rather than (batch_size * img_size * img_size * n_channels)? That is, we change this line:
style_kl_divergence_loss /= (FLAGS.batch_size * FLAGS.num_channels * FLAGS.image_size * FLAGS.image_size)
to
style_kl_divergence_loss /= 10 #assuming there are samples from 10 classes in the minibatch..
and do the same for the content.
Is this correct?
Many thanks
Hi,
Yes but this is not the only change that has to be done. The implementation of the accumulation of evidence function, and some bugs (e.g. calling .data on Variable breaks the back-propagation of gradient, preventing the content part to learn anything) needed to be fixed. You can find the corrected version in this fork:
https://github.com/DianeBouchacourt/multi-level-vae
You can have a look already, it needs some cleaning but is correct.
Hello @DianeBouchacourt , Thank you kindly for your response, I appreciate it. I'll be sure to take look.
Many thanks for your efforts again.
Hello @DianeBouchacourt , I've taken a look at the code and I get some of it...but is it possible to have a version which is not tightly-coupled with the MNIST dataset? In particular,
- is it simply sufficient to swap out the MNIST-specific bits (e.g replacing all references to 784 with the size of the input feature vector)?
- Additionally, if the input vector is itself not normalized (i.e the elements may not be within the range of 0 to 1 for instance) does this in any way affect the calculation of the losses, or do the input vectors absolutely need to be normalized?
Thanks in advance!