examples DCGAN example doesn't work with different image sizes

I'm trying to use this code as a starting point for building GANs from my own image data-- 512x512 grayscale images. If I change any of the default arguments (e.g. --imageSize 512) I get the following error:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    errD_real = criterion(output, label)
  File "/opt/python/lib/python3.6/site-packages/torch/nn/modules/module.py", line 210, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/python/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 36, in forward
    return backend_fn(self.size_average, weight=self.weight)(input, target)
  File "/opt/python/lib/python3.6/site-packages/torch/nn/_functions/thnn/loss.py", line 22, in forward
    assert input.nelement() == target.nelement()
AssertionError

Still learning my way around PyTorch so the network architectures that are spit out before the above message don't yet give me much intuition. I appreciate any pointers you can give!

Feb 17 '17 01:02 magsol

The error tells you that the number of inputs to the loss function is different than the number of given targets. It happens in the line 209. The problem is that the generator and discriminator architectures are apparently fixed to the default image size (see annotations in the model). Adding a pooling layer at the end of the discriminator, that squeezes every batch element into a 1x1x1 image would help. I think that appending nn.MaxPool2d(opt.imageSize // 64) after the Sigmoid would fix that.

Feb 17 '17 18:02 apaszke

As @apaszke mentions, the G and D networks are generated with the 64x64 limitations hardcoded. The implementation of the DCGAN here is very similar to the dcgan.torch implementation, and someone else asked about this limitation and got this answer: https://github.com/soumith/dcgan.torch/issues/2#issuecomment-164862299

By following the changes suggested in that comment, you can expand the network for 128x128. So for the generator:

class _netG(nn.Module):
    def __init__(self, ngpu):
        super(_netG, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d(     nz, ngf * 16, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 16),
            nn.ReLU(True),
            # state size. (ngf*16) x 4 x 4
            nn.ConvTranspose2d(ngf * 16, ngf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 8 x 8
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 16 x 16 
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 32 x 32
            nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 64 x 64
            nn.ConvTranspose2d(    ngf,      nc, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 128 x 128
        )

And for the discriminator:

class _netD(nn.Module):
    def __init__(self, ngpu):
        super(_netD, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 128 x 128
            nn.Conv2d(nc, ndf, 4, stride=2, padding=1, bias=False), 
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 64 x 64
            nn.Conv2d(ndf, ndf * 2, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 32 x 32
            nn.Conv2d(ndf * 2, ndf * 4, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 16 x 16 
            nn.Conv2d(ndf * 4, ndf * 8, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 8 x 8
            nn.Conv2d(ndf * 8, ndf * 16, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 16),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*16) x 4 x 4
            nn.Conv2d(ndf * 16, 1, 4, stride=1, padding=0, bias=False),
            nn.Sigmoid()
            # state size. 1
        )

However, as you can also see in that thread, it is harder to get a stable game between the generator and discriminator for this larger problem. To avoid this, I think you'll have to take a look at the improvements used here https://github.com/openai/improved-gan (paper: https://arxiv.org/abs/1606.03498). This repository includes a model for 128x128 imagenet generation.

Feb 27 '17 16:02 bartolsthoorn

Ahh, thank you for the extra information; that helps immensely, in addition to the intuition for possibly less stable training processes given the larger images.

So @bartolsthoorn for the images I'm using--512x512--I probably should look into the improved GAN paper and associated OpenAI implementation?

Feb 27 '17 16:02 magsol

@magsol I would suggest to first try your dataset on the standard 64x64. Next you run it on 128x128, with either the extra convolution or pooling layer as listed above. After that you can try 512x512, I am no expert but I have not seen pictures that large generated by a DCGAN. You could also consider generating 128x128 images and then use a separate super-resolution network to reach 512x512.

64x64 and 128x128 are easy to try (the model includes the preprocessing, i.e. the rescaling of the images) and should be easier to generate. Did you already get good results with your data on the 64x64 scale? Please share your experience so far. :smile:

Feb 27 '17 17:02 bartolsthoorn

@bartolsthoorn I ran dcgan with the following arguments:

python main.py --cuda --dataset folder --dataroot /images --outf /output

I tried changing the nc = 3 value to nc = 1 since the images are all grayscale, but kept getting CUDNN_STATUS_BAD_PARAM errors, so I left the default value unchanged.

Unfortunately after very few training iterations, it looks like the mode collapsed:

The images from the 24th epoch look like pure static:

fake_samples_epoch_024

The real images, on the other hand, look like this:

real_samples

Happy to hear any suggestions you may have :) Thank you so much for your help so far! Learning a lot about GANs!

Mar 02 '17 20:03 magsol

Managed to override the default image loader in torchvision so it properly pulls the images in with grayscale format and changed the nc = 1--seems to be running nicely now :) Though the loss functions are still quickly hitting 1 and 0 respectively as before, so I'm not sure that the results of this will be any better than the last one.

Mar 02 '17 21:03 magsol

No improvement, though I guess it's a little easier to see that it's not pure noise in the fake images. Still looks like static, though.

fake_samples_epoch_024

Mar 02 '17 22:03 magsol

Yes, the learning is unstable. There are some new interesting suggestion in the dcgan.torch thread: https://github.com/soumith/dcgan.torch/issues/2#issuecomment-283982237

Set ndf to ngf/4, this changes the size of the G and D models in order to balance the training
Add white noise (this is a trick also mentioned here: ganhacks)

Mar 04 '17 12:03 bartolsthoorn

Any luck with the above ~tricks~ heuristics?

Mar 14 '17 09:03 LukasMosser

Ha! Isn't all of practical ML just tricks :) Haven't had a chance yet--behind on my teaching. Hoping to get back to this within the next week though, and will absolutely update when I do. On Tue, Mar 14, 2017 at 04:40 Lukas Mosser [email protected] wrote:

Any luck with the above tricks heuristics?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pytorch/examples/issues/70#issuecomment-286369535, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIQ-V7JYksn7-m-IfnFvafXVdcqfQtSks5rlmCYgaJpZM4MDwky .

-- iPhone'd

Mar 18 '17 22:03 magsol

@magsol If you happen to run into difficulties training the 512x512 images, you could always scale down the images first to say 64^2 and see if it would even get you the results you'd want, then later scale up? Thanks for checking back! :)

Mar 19 '17 17:03 LukasMosser

Seems like I have one solution to this problem, after the discriminator, the output size changed with the input, when you using loss calculate the input and the target, your label size not changed, so you can let the label size changed with your input size, or you should let the discriminator output the fix size no matter what size data you input .

FYI

size_feature = self.D_A(x_A).size()
real_tensor.data.resize_(size_feature).fill_(real_label)
fake_tensor.data.resize_(size_feature).fill_(fake_label)
l_d_A_real, l_d_A_fake = bce(self.D_A(x_A), real_tensor), bce(self.D_A(x_BA), fake_tensor)

like your x_A had size [batch_size, 3, 64, 64], after the D , size_feature will be [batch_size, 1], real_label size [batch_size], But when your input x_A size changed ,like [batch_size, 3, 128, 128], after the D, the output size will be [batch_size, 25], when you calculating the loss between [batch_size, 25] and label [batch_size,] occurred the error.

Jun 27 '17 12:06 zencyyoung

Is there any way to implement a DCGAN that generates rectangular images (ie 128x32) in Pytorch? Nearly every example I've seen works with square images.

Dec 20 '18 15:12 xjdeng

Yes, you can do that @xjdeng, you simply have to ensure that the output of your FC layer has the ratio you want from the final output. That is one way. So in your case, the dense layer output should be (batch_size, channels, 4, 1) or some multiple of that. If your network then consists of transposed convolutions that double in size each layer you would need 5 transposed conv layers to get images of size (batch_size, channels, 128, 32)

Dec 22 '18 11:12 LukasMosser

Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn 's suggestion?

Mar 21 '19 15:03 enochkan

DCGAN is quite old. Check the latest papers on GANs and you will find many large resolution models/examples. You need a dataset with high resolution images as well (of course).

On Thu, 21 Mar 2019 at 16:56, Chi Nok Enoch K [email protected] wrote:

Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn https://github.com/bartolsthoorn 's suggestion?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pytorch/examples/issues/70#issuecomment-475289841, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGgWk-LFMMdIzfFfUK6Fg2YBQHumo6yks5vY6vBgaJpZM4MDwky .

Mar 21 '19 16:03 bartolsthoorn

@bartolsthoorn thank you for the reply. I’m pretty new to GAN training, if I have downloaded art images from wiki art and they have different sizes, do I have to somehow preprocess all of them to the same size (eg. 512x512)? What about rectangular images?

Mar 21 '19 16:03 enochkan

I was able to get dcgan operating successfully at 128x128 by adding the convolutional layers described above and then running with ngf 128 and ndf 32. When I attempted to go to 512 I was not able to get a stable result. I'm attempting to add the white noise to the discriminator to see if that helps.

** I ended up abandoning dcgan and am now over using bmsggan which is a variation on progressive gans. Its handling higher resolutions much better **

May 29 '19 23:05 powerspowers

Hi, I am also trying to implement DCGAN for grayscale image using pytorch. But i got error saying 'RuntimeError: Given groups=1, weight of size 64 1 4 4, expected input[128, 3, 64, 64] to have 1 channels, but got 3 channels instead'. I already set the number of channel as 1 but still got error. do you happen to know where can I fix the problem.

May 07 '20 03:05 nalinzie

@nalinzie If you share the code here then it's maybe possible to help.

May 07 '20 09:05 LukasMosser

@nalinzie If you share the code here then it's maybe possible to help.

https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html i got the code from this site. The example code from this site works for the RGB image. But i am working on my own grayscale image. Therefore, I changed the number of channels nc as 1. besides that, I just keep the same. However, when i was trying to train the model, the error occured saying RuntimeError: Given groups=1, weight of size 64 1 4 4, expected input[128, 3, 64, 64] to have 1 channels, but got 3 channels instead. I dont know which part should I change to change my input to have 1 channel.

May 07 '20 16:05 nalinzie

@nalinzie Make sure to check what your dataloader is outputting is also a single-channel image. https://github.com/pytorch/examples/blob/b9f3b2ebb9464959bdbf0c3ac77124a704954828/dcgan/main.py#L60

You can also do a print(X.size()) right before you put anything in and out of your model to check what dimensions your tensors are actually.

May 09 '20 14:05 LukasMosser

Hello, i am also working with example code and trying to get it work with smaller resolution 16x16 images, but it doesnt work with those dimensions. How i need to change generator and discriminator code for DCGAN to work with 16x16 images?

Aug 10 '20 21:08 Limofeus

When I use the D and G code given above for 128 I am getting the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-279-6b5de1c111f4> in <module>
     23         label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
     24         # Forward pass real batch through D
---> 25         output = netD(real_cpu).view(-1)
     26         # Calculate loss on all-real batch
     27         errD_real = criterion(output, label)

~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

<ipython-input-276-1702f960857a> in forward(self, input)
     30 
     31     def forward(self, input):
---> 32         return self.main(input)

~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/container.py in forward(self, input)
    115     def forward(self, input):
    116         for module in self:
--> 117             input = module(input)
    118         return input
    119 

~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
    421 
    422     def forward(self, input: Tensor) -> Tensor:
--> 423         return self._conv_forward(input, self.weight)
    424 
    425 class Conv3d(_ConvNd):

~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight)
    418                             _pair(0), self.dilation, self.groups)
    419         return F.conv2d(input, weight, self.bias, self.stride,
--> 420                         self.padding, self.dilation, self.groups)
    421 
    422     def forward(self, input: Tensor) -> Tensor:

RuntimeError: Calculated padded input size per channel: (2 x 2). Kernel size: (4 x 4). Kernel size can't be greater than actual input size

Feb 21 '21 04:02 EvanZ

Changing the kernel size to 2 and the stride to 4 in the last Conv2D of the Discriminator seems to fix that error, but just want to make sure I'm not crazy here.

Feb 21 '21 04:02 EvanZ

I ended up abandoning dcgan and am now over using bmsggan which is a variation on progressive gans. Its handling higher resolutions much better

is it working now EvanZ?

Mar 24 '22 10:03 devanna999

DCGAN is quite old. Check the latest papers on GANs and you will find many large resolution models/examples. You need a dataset with high resolution images as well (of course). … On Thu, 21 Mar 2019 at 16:56, Chi Nok Enoch K @.***> wrote: Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn https://github.com/bartolsthoorn 's suggestion? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGgWk-LFMMdIzfFfUK6Fg2YBQHumo6yks5vY6vBgaJpZM4MDwky .

Could you name Some GANS , which could be made to work easily for any input sizes?

Mar 24 '22 10:03 devanna999

DCGAN is quite old. Check the latest papers on GANs and you will find many large resolution models/examples. You need a dataset with high resolution images as well (of course). … On Thu, 21 Mar 2019 at 16:56, Chi Nok Enoch K @.***> wrote: Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn https://github.com/bartolsthoorn 's suggestion? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGgWk-LFMMdIzfFfUK6Fg2YBQHumo6yks5vY6vBgaJpZM4MDwky .

DCGAN is quite old. Check the latest papers on GANs and you will find many large resolution models/examples. You need a dataset with high resolution images as well (of course). … On Thu, 21 Mar 2019 at 16:56, Chi Nok Enoch K @.***> wrote: Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn https://github.com/bartolsthoorn 's suggestion? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGgWk-LFMMdIzfFfUK6Fg2YBQHumo6yks5vY6vBgaJpZM4MDwky .

Could you name Some GANS , which could be made to work easily for any input sizes?

Hello, have you found it? Can you share it?

Apr 12 '22 08:04 chang7ing

bmsggan

no chang7ing, I couldn't find it. If you found ,please share here.

May 19 '22 13:05 devanna999

您好，我已收到！Thank you for your answer sheet. I have received your answer sheet.

May 19 '22 13:05 chang7ing

examples examples copied to clipboard

DCGAN example doesn't work with different image sizes

examples
examples copied to clipboard