clusterGAN Error while running it in Pytorch 1.7

Issue: Cluster GAN existing code doesn't work in Pytorch 1.7

Disclaimer: I know there is nothing wrong with the code or implementation you've written. But any help on this would be appreciated.

The following steps work in Pytorch 1.0 but not in torch 1.7.0+cu101

optimizer_ge = Adam(itertools.chain(encoder.parameters(), generator.parameters()) ....)
opt_disc = Adam(discriminator.parameters() .....)

The generator and the encoder are updated together and the discriminator is updated separately.

The following is done for each batch of images

generator.train()
encoder.train()

generator.zero_grad()
encoder.zero_grad()
discriminator.zero_grad()

optimizer_ge.zero_grad()

fake_image = generator(random_z)
fake_op = discriminator(fake_image)
real_op = discriminator(real_image)
zn, zc, zc_idx = encoder(fake_image)

ge_loss = (Cross_entropy loss) + (Clustering_loss) 
ge_loss.backward(retain_graph=True)
optimizer_ge.step()

opt_disc.zero_grad()
# Compute vannila gan discriminator loss disc_loss using bce loss function
disc_loss.backward()
opt_disc.step()

The above code works fine in torch 1.0 but torch 1.7 throws the following error.

one of the variables needed for gradient computation has been modified by an inplace operation: 
[torch.cuda.FloatTensor [64, 1, 4, 4]] is at version 2; expected version 1 instead. 
Hint: enable anomaly detection to find the operation that failed to
 compute its gradient, with torch.autograd.set_detect_anomaly(True).

The error seems to be resolved when I do

fake_op = discriminator(fake_image.detach())

or

ge_loss.backward(retain_graph=True)
disc_loss.backward()
optimizer_ge.step()
opt_disc.step()

However, the results after doing the above changes aren't matching up with the results of the code run in torch 1.0

Nov 26 '20 05:11 mancunian1792

It seems that that error appears due to using the same variables that require grad after implementing opt_GE.step() I was able to correct that by simply re-initializing those variables, i.e., fake_images = netG(zn, zc) pred_fake = netD(fake_images) ... optGE.step() pred_fake = netD(fake_images.detach()) ... optD.step()

Jan 23 '21 09:01 Hong753

@Hong753 Could you please paste the exact code how you solved this issue (and where the code should be pasted)? I am really struggling in migrating to pytorch from Keras and I would like to reproduce these results before even attempting to modify the code for other cases. Thank you

Sep 07 '22 19:09 djsavic

@djsavic I have fixed it in the following way: First I took a deepcopy of the discriminator using the Python copy library `generator.train() encoder.train() generator.zero_grad() encoder.zero_grad() discriminator.zero_grad() optimizer_G.zero_grad()

d_c = copy.deepcopy(discriminator)

x, y = batch #zn, zc, zc_idx = generator_input_sampler(latent_space_zn, batch_size=32) # create fake digits zn, zc, zc_idx = sample_z(latent_dim=latent_space_zn, shape=BATCH_SIZE)`

Then I calculated pred_real and pred_fake using this copy x_fake = generator(zn.to(device),zc.to(device)) # create fake imgs pred_real = d_c(x.to(device)) pred_fake = d_c(x_fake)

Now the code runs without errors and produces the results from the paper. Hope this helped you!

Sep 28 '22 09:09 timodw

clusterGAN clusterGAN copied to clipboard

Error while running it in Pytorch 1.7

clusterGAN
clusterGAN copied to clipboard