adversarial-autoencoder icon indicating copy to clipboard operation
adversarial-autoencoder copied to clipboard

Why did you choose the softmax cross entropy ?

Open nianzu-ethan-zheng opened this issue 7 years ago • 12 comments

Hi ,Thanks for your code , I am confused about why you choose the softmax cross entropy , I also tried this loss ,unfortunately , it did work but not so good , when I regularized the z representation space into a gaussian distribution using the softmax cross entropy , I got a off-centre distribution ,like the following: result

you can see gaussian expectation is about (-2, -2.5) rather than (0 , 0) ,which quitely confused me. I can't explain it from theory, and you can see the equilibrium between the generator and discriminator,

equilibrium

D and G have reached the expected value about log2 and log4 respectively, and I don't know why the equilibrium can be found, In general standpoint, discriminator can be easily judge the sample generated from generator, and I have been well check the code , it have no bug or type the wrong variable. I don't know what's going on and what't wrong with it ?

nianzu-ethan-zheng avatar Oct 09 '17 12:10 nianzu-ethan-zheng

Have you plotted representation rather than z ?

screenshot from 2017-10-09 22-16-29

Since representation is the sum of cluster-head and z, the center may not come to (0, 0).

z

scatter_z

representation

scatter_r

musyoku avatar Oct 09 '17 13:10 musyoku

In the latest version of code, the definition of discriminator for z is as follows:

run/unsupervised/dim_reduction/model.py

self.discriminator_z = nn.Module(
	nn.GaussianNoise(std=0.3),   # add noise
	nn.Linear(ndim_z, ndim_h),
	nn.ReLU(),
	nn.Linear(ndim_h, ndim_h),
	nn.ReLU(),
	nn.Linear(ndim_h, 2),
)

If you delete nn.GaussianNoise(std=0.3),, discriminator can easily judge the samples generated from generator.

musyoku avatar Oct 09 '17 13:10 musyoku

Yeah Thanks your reply. I just regularize the representation z without the transform , The encoder just output 2 dimensional vector which just are put into discriminator. The lastest result actually show that Maybe the training epoch is too much, Let me expian it:

z space just like the below in the earlier epoch earlier

when you try to train more epoch in oder to get more clearly boundary, the result just like this: later

you can see the the whole space just move left with slightly changing the shape. The gaussian noise of input layer of discriminator has been removed .

So I guess the adversarial loss maybe give in to the autoencoder loss, while the adversarial loss just change slightly(about 0.001) and the autoencoder loss decreasing obviously.(about 0.1 - 1)

So the autoencoder pull the space to off-centre , is right ?

nianzu-ethan-zheng avatar Oct 09 '17 13:10 nianzu-ethan-zheng

I have the same view as you.

musyoku avatar Oct 09 '17 13:10 musyoku

What happens if you try z to 10 2d-gaussian mixture or swiss-roll ?

scatter_gen

swiss_roll

musyoku avatar Oct 09 '17 13:10 musyoku

I think the result you posted is under the condition of semi supervised learning, I alse tried the unsupervised learning, giving the result as follows:  Swiss row -label 10

the gaussian mixture like this Gaussian mixture -label 10

nianzu-ethan-zheng avatar Oct 10 '17 02:10 nianzu-ethan-zheng

I think It's hard to constrain the representation z of unsupervise learning to a certain area, such as (-2,2). the space drift about to get better interpretation of real space. But the most confused thing is the space magically keep the shape and structure.

nianzu-ethan-zheng avatar Oct 10 '17 12:10 nianzu-ethan-zheng

I tried the unsupervised learning. The results after 300 epochs is as follows:

10 2d-gaussian mixture gauss

10 2d-gaussian mixture + noise gauss_noise

swissroll swiss

swissroll + noise swiss_noise

code: https://gist.github.com/musyoku/ad203d1cb24c60b0926043125e676c71

musyoku avatar Oct 12 '17 07:10 musyoku

[y_onehot_u = xp.zeros((1, model.ndim_y), dtype=xp.float32)
y_onehot_u[0, -1] = 1	# turn on the extra class
y_onehot_u = xp.repeat(y_onehot_u, args.batchsize, axis=0)]

I think the y_onehot_u is unnecessary. Maybe I find the origin of causing the space shifting , which batch normalization can account for . In the later experiment when I remove BN , the representation space bound to the ideal area. But there is another problem followed that the autoencoder the adversarial phase suddenly break the balance between encoder and discriminator ,you can see the plot as followed:

loss-adversarial-z

and the reconstruction loss react to it.

loss-reconstruction

Do you early stopping your training ?

nianzu-ethan-zheng avatar Oct 13 '17 13:10 nianzu-ethan-zheng

No I don't. I think that the reconstruction loss value in your plot is too high. In my experiment the reconstruction loss is less than 0.1.

I found that BN didn't work, so I don't use BN in all my experiments. I also found that adding gaussian noise to the input of the discriminator stabilizes GAN.

musyoku avatar Oct 13 '17 17:10 musyoku

I don't know why the reconstruction is too high, Maybe it's a programming problem, the formula is as follow: image

and using the same structure and same parameters ,such as learning rate =0.0001, momentum=0.5, however the balance will be broken while I train the network more than 300 epochs . I don't know why and I have been tried adding some noise to the input of discriminator (std =0.3/0.01,I think it's too big, so I tried std = 0.01), but it don't work, By the way , I implement the idea through tensorflow instead of chainer.

nianzu-ethan-zheng avatar Oct 16 '17 08:10 nianzu-ethan-zheng

Please try the supervised learning with only reconstruction loss. set the learning rate to a high value and check whether the training reconstruction loss becomes a small value (or 0). If not, the implementation is incorrect.

musyoku avatar Oct 17 '17 11:10 musyoku