adversarial-autoencoder
adversarial-autoencoder copied to clipboard
Why did you choose the softmax cross entropy ?
Hi ,Thanks for your code , I am confused about why you choose the softmax cross entropy , I also tried this loss ,unfortunately , it did work but not so good , when I regularized the z representation space into a gaussian distribution using the softmax cross entropy , I got a off-centre distribution ,like the following:
you can see gaussian expectation is about (-2, -2.5) rather than (0 , 0) ,which quitely confused me. I can't explain it from theory, and you can see the equilibrium between the generator and discriminator,
D and G have reached the expected value about log2 and log4 respectively, and I don't know why the equilibrium can be found, In general standpoint, discriminator can be easily judge the sample generated from generator, and I have been well check the code , it have no bug or type the wrong variable. I don't know what's going on and what't wrong with it ?
Have you plotted representation rather than z ?
Since representation is the sum of cluster-head and z, the center may not come to (0, 0).
z
representation
In the latest version of code, the definition of discriminator for z is as follows:
run/unsupervised/dim_reduction/model.py
self.discriminator_z = nn.Module(
nn.GaussianNoise(std=0.3), # add noise
nn.Linear(ndim_z, ndim_h),
nn.ReLU(),
nn.Linear(ndim_h, ndim_h),
nn.ReLU(),
nn.Linear(ndim_h, 2),
)
If you delete nn.GaussianNoise(std=0.3),
, discriminator can easily judge the samples generated from generator.
Yeah Thanks your reply. I just regularize the representation z without the transform , The encoder just output 2 dimensional vector which just are put into discriminator. The lastest result actually show that Maybe the training epoch is too much, Let me expian it:
z space just like the below in the earlier epoch
when you try to train more epoch in oder to get more clearly boundary, the result just like this:
you can see the the whole space just move left with slightly changing the shape. The gaussian noise of input layer of discriminator has been removed .
So I guess the adversarial loss maybe give in to the autoencoder loss, while the adversarial loss just change slightly(about 0.001) and the autoencoder loss decreasing obviously.(about 0.1 - 1)
So the autoencoder pull the space to off-centre , is right ?
I have the same view as you.
What happens if you try z to 10 2d-gaussian mixture or swiss-roll ?
I think the result you posted is under the condition of semi supervised learning, I alse tried the unsupervised learning, giving the result as follows:
the gaussian mixture like this
I think It's hard to constrain the representation z of unsupervise learning to a certain area, such as (-2,2). the space drift about to get better interpretation of real space. But the most confused thing is the space magically keep the shape and structure.
I tried the unsupervised learning. The results after 300 epochs is as follows:
10 2d-gaussian mixture
10 2d-gaussian mixture + noise
swissroll
swissroll + noise
code: https://gist.github.com/musyoku/ad203d1cb24c60b0926043125e676c71
[y_onehot_u = xp.zeros((1, model.ndim_y), dtype=xp.float32)
y_onehot_u[0, -1] = 1 # turn on the extra class
y_onehot_u = xp.repeat(y_onehot_u, args.batchsize, axis=0)]
I think the y_onehot_u is unnecessary. Maybe I find the origin of causing the space shifting , which batch normalization can account for . In the later experiment when I remove BN , the representation space bound to the ideal area. But there is another problem followed that the autoencoder the adversarial phase suddenly break the balance between encoder and discriminator ,you can see the plot as followed:
and the reconstruction loss react to it.
Do you early stopping your training ?
No I don't. I think that the reconstruction loss value in your plot is too high. In my experiment the reconstruction loss is less than 0.1.
I found that BN didn't work, so I don't use BN in all my experiments. I also found that adding gaussian noise to the input of the discriminator stabilizes GAN.
I don't know why the reconstruction is too high, Maybe it's a programming problem, the formula is as follow:
and using the same structure and same parameters ,such as learning rate =0.0001, momentum=0.5, however the balance will be broken while I train the network more than 300 epochs . I don't know why and I have been tried adding some noise to the input of discriminator (std =0.3/0.01,I think it's too big, so I tried std = 0.01), but it don't work, By the way , I implement the idea through tensorflow instead of chainer.
Please try the supervised learning with only reconstruction loss. set the learning rate to a high value and check whether the training reconstruction loss becomes a small value (or 0). If not, the implementation is incorrect.