BVAE-tf
BVAE-tf copied to clipboard
Some questions about implementation
Hi,
I've been using your code in some experiments. I have the following questions:
-
Applying your recent committed changes to the loss actually resulted in predicted values with weird (larger) ranges in my experiments, which were weirder to convert to an image. I had to "roll back" to the previous version... Have you noticed such an impact?
-
Shouldn't the last layer have a sigmoid as activation so that the output has values between 0 and 1? These values should be comparable to the input ones, which I think are rescaled to be between 0 and 1, I am correct? Does this affect the reconstruction loss?
-
Also, in some other implementations the common reconstruction loss is the mean squared error and not the mean absolute error. Do you use 'mae" for some reason?
-
This is an extra issue that I'm having. Have you been able to use the Tensorboard callback to log the losses and metrics? When trying to add the Tensorboard callback I get an error which I think is because the ae model is made of two models, and thus internally has more than one loss. I get the following error: line 1050, in _write_custom_summaries summary_value.simple_value = value.item() ValueError: can only convert an array of size 1 to a Python scalar I could not find a solution yet..!
-
Minor detail: Why changing the stddev to its absolute value? Can it ever be negative?!
I'm sorry for the long text and for raising all these issues, but I think they may be relevant for more users too!
Thank you in advance!
-
The last commit changed the applied value of beta loss to be summed instead of averaged over the values in the latent space, which I think is what is done in other implementations. This greatly increases beta's contribution to the gradient.
-
Changing the output to sigmoid would force the output to the desired range, so it is probably a good idea. Without that, the problem is likely much more difficult for the network to learn. I will test out this change.
-
I am using the mean absolute error / L1 distance because that is what was used in the cyclegan paper, and I just remembered that as I was making this.
-
I have not tried to use tensorboard with this system yet. Post something if you figure it out! I am interested what the solution could be.
-
~~I made this change mostly because negative std deviation did not make sense to me. And I am pretty sure it would break the loss function (https://github.com/alecGraves/BVAE-tf/issues/4)~~ (see below)
Thanks for the questions 😄
- Update: the variable named stddev (which was the output of the previous layer) actually represents log variance, which can be negative. I corrected the variable name and undid the abs in https://github.com/alecGraves/BVAE-tf/commit/810506b1a142da49b3cc7eddcc4bb32856d5e51c
- This is also kinda a better resolution to #4
Thank you for your reply and updates! I'm going to test the refactored version and I'll let you know if something changes on my experiments. I saw you added a tanh activation. I'll also let you know if I figure something out regarding the use of tensorflow.
Please let me know if you happen figure something out too :)
Thank you
Hi!
I've tested your refactored version with my experiments. Results are different! For the better since I am able to get better reconstructions. Cool, thank you!
Just a question: is there any difference in feeding the auto-encoder with a range [-1,1] like you do, or feed the images in the range [0,1]? I'm using the second option and everything looks fine. The auto-encoder should adapt to the range (sampling layer adapts to any distribution), correct? The only thing I think I should change is the final activation layer to a sigmoid so that my outputs are also in the range [0,1]. The loss function should be the same?
Thank you again!
Yes, the network should adapt to the different range without a problem. Changing the output layer to sigmoid would probably help the network because you are constraining the output to the desired range.