clusterGAN
clusterGAN copied to clipboard
Error while running it in Pytorch 1.7
Issue: Cluster GAN existing code doesn't work in Pytorch 1.7
Disclaimer: I know there is nothing wrong with the code or implementation you've written. But any help on this would be appreciated.
The following steps work in Pytorch 1.0 but not in torch 1.7.0+cu101
optimizer_ge = Adam(itertools.chain(encoder.parameters(), generator.parameters()) ....)
opt_disc = Adam(discriminator.parameters() .....)
The generator and the encoder are updated together and the discriminator is updated separately.
The following is done for each batch of images
generator.train()
encoder.train()
generator.zero_grad()
encoder.zero_grad()
discriminator.zero_grad()
optimizer_ge.zero_grad()
fake_image = generator(random_z)
fake_op = discriminator(fake_image)
real_op = discriminator(real_image)
zn, zc, zc_idx = encoder(fake_image)
ge_loss = (Cross_entropy loss) + (Clustering_loss)
ge_loss.backward(retain_graph=True)
optimizer_ge.step()
opt_disc.zero_grad()
# Compute vannila gan discriminator loss disc_loss using bce loss function
disc_loss.backward()
opt_disc.step()
The above code works fine in torch 1.0 but torch 1.7 throws the following error.
one of the variables needed for gradient computation has been modified by an inplace operation:
[torch.cuda.FloatTensor [64, 1, 4, 4]] is at version 2; expected version 1 instead.
Hint: enable anomaly detection to find the operation that failed to
compute its gradient, with torch.autograd.set_detect_anomaly(True).
The error seems to be resolved when I do
fake_op = discriminator(fake_image.detach())
or
ge_loss.backward(retain_graph=True)
disc_loss.backward()
optimizer_ge.step()
opt_disc.step()
However, the results after doing the above changes aren't matching up with the results of the code run in torch 1.0
It seems that that error appears due to using the same variables that require grad after implementing opt_GE.step() I was able to correct that by simply re-initializing those variables, i.e., fake_images = netG(zn, zc) pred_fake = netD(fake_images) ... optGE.step() pred_fake = netD(fake_images.detach()) ... optD.step()
@Hong753 Could you please paste the exact code how you solved this issue (and where the code should be pasted)? I am really struggling in migrating to pytorch from Keras and I would like to reproduce these results before even attempting to modify the code for other cases. Thank you
@djsavic I have fixed it in the following way: First I took a deepcopy of the discriminator using the Python copy library `generator.train() encoder.train() generator.zero_grad() encoder.zero_grad() discriminator.zero_grad() optimizer_G.zero_grad()
d_c = copy.deepcopy(discriminator)
x, y = batch #zn, zc, zc_idx = generator_input_sampler(latent_space_zn, batch_size=32) # create fake digits zn, zc, zc_idx = sample_z(latent_dim=latent_space_zn, shape=BATCH_SIZE)`
Then I calculated pred_real and pred_fake using this copy
x_fake = generator(zn.to(device),zc.to(device)) # create fake imgs pred_real = d_c(x.to(device)) pred_fake = d_c(x_fake)
Now the code runs without errors and produces the results from the paper. Hope this helped you!