multi-level-vae icon indicating copy to clipboard operation
multi-level-vae copied to clipboard

Replication of results

Open DianeBouchacourt opened this issue 5 years ago • 4 comments

HI Ananya,

I am the first author of the ML-VAE paper, thank you for implementing it.

I am surprised that you did not succeed to reproduce the results, as we show that even without accumulating evidence at test time the swapping experiments worked on MNIST. Do you use the mean value of the encoding, or a sample ? Moreover, we experimented with both sampling one content for all x in the group or one per x and both worked (see footnote 1 in https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/16521/15918)

I had a look at your code, from what I understand you divide both the KL on the style and on the content by batch_size * image_size * image_size * num_channels ? That is surprising, as the KL divergence is computed on the latent code distribution. We actually divide the sum of group ELBO by the number of groups (see Eq. 5 in the paper), not the batch size. Is it an important parameter which clearly differ from classic VAE.

DianeBouchacourt avatar May 29 '19 13:05 DianeBouchacourt

Hello @DianeBouchacourt , I'm also interested in using this implementation but on a different type of data than images. Based on your comment above, is it correct to assume that the trainer code should be modified to divide the divergence losses by the number of groups in the minibatch rather than (batch_size * img_size * img_size * n_channels)? That is, we change this line:

style_kl_divergence_loss /= (FLAGS.batch_size * FLAGS.num_channels * FLAGS.image_size * FLAGS.image_size)

to

style_kl_divergence_loss /= 10 #assuming there are samples from 10 classes in the minibatch..

and do the same for the content.

Is this correct?

Many thanks

madarax64 avatar Jun 23 '19 21:06 madarax64

Hi,

Yes but this is not the only change that has to be done. The implementation of the accumulation of evidence function, and some bugs (e.g. calling .data on Variable breaks the back-propagation of gradient, preventing the content part to learn anything) needed to be fixed. You can find the corrected version in this fork:

https://github.com/DianeBouchacourt/multi-level-vae

You can have a look already, it needs some cleaning but is correct.

DianeBouchacourt avatar Jun 27 '19 15:06 DianeBouchacourt

Hello @DianeBouchacourt , Thank you kindly for your response, I appreciate it. I'll be sure to take look.

Many thanks for your efforts again.

madarax64 avatar Jun 27 '19 19:06 madarax64

Hello @DianeBouchacourt , I've taken a look at the code and I get some of it...but is it possible to have a version which is not tightly-coupled with the MNIST dataset? In particular,

  • is it simply sufficient to swap out the MNIST-specific bits (e.g replacing all references to 784 with the size of the input feature vector)?
  • Additionally, if the input vector is itself not normalized (i.e the elements may not be within the range of 0 to 1 for instance) does this in any way affect the calculation of the losses, or do the input vectors absolutely need to be normalized?

Thanks in advance!

madarax64 avatar Jun 27 '19 23:06 madarax64