examples
examples copied to clipboard
DCGAN example doesn't work with different image sizes
I'm trying to use this code as a starting point for building GANs from my own image data-- 512x512 grayscale images. If I change any of the default arguments (e.g. --imageSize 512
) I get the following error:
Traceback (most recent call last):
File "main.py", line 209, in <module>
errD_real = criterion(output, label)
File "/opt/python/lib/python3.6/site-packages/torch/nn/modules/module.py", line 210, in __call__
result = self.forward(*input, **kwargs)
File "/opt/python/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 36, in forward
return backend_fn(self.size_average, weight=self.weight)(input, target)
File "/opt/python/lib/python3.6/site-packages/torch/nn/_functions/thnn/loss.py", line 22, in forward
assert input.nelement() == target.nelement()
AssertionError
Still learning my way around PyTorch so the network architectures that are spit out before the above message don't yet give me much intuition. I appreciate any pointers you can give!
The error tells you that the number of inputs to the loss function is different than the number of given targets. It happens in the line 209. The problem is that the generator and discriminator architectures are apparently fixed to the default image size (see annotations in the model). Adding a pooling layer at the end of the discriminator, that squeezes every batch element into a 1x1x1
image would help. I think that appending nn.MaxPool2d(opt.imageSize // 64)
after the Sigmoid would fix that.
As @apaszke mentions, the G and D networks are generated with the 64x64 limitations hardcoded. The implementation of the DCGAN here is very similar to the dcgan.torch implementation, and someone else asked about this limitation and got this answer: https://github.com/soumith/dcgan.torch/issues/2#issuecomment-164862299
By following the changes suggested in that comment, you can expand the network for 128x128. So for the generator:
class _netG(nn.Module):
def __init__(self, ngpu):
super(_netG, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is Z, going into a convolution
nn.ConvTranspose2d( nz, ngf * 16, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 16),
nn.ReLU(True),
# state size. (ngf*16) x 4 x 4
nn.ConvTranspose2d(ngf * 16, ngf * 8, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.ReLU(True),
# state size. (ngf*8) x 8 x 8
nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
# state size. (ngf*4) x 16 x 16
nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
# state size. (ngf*2) x 32 x 32
nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf),
nn.ReLU(True),
# state size. (ngf) x 64 x 64
nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),
nn.Tanh()
# state size. (nc) x 128 x 128
)
And for the discriminator:
class _netD(nn.Module):
def __init__(self, ngpu):
super(_netD, self).__init__()
self.ngpu = ngpu
self.main = nn.Sequential(
# input is (nc) x 128 x 128
nn.Conv2d(nc, ndf, 4, stride=2, padding=1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf) x 64 x 64
nn.Conv2d(ndf, ndf * 2, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 2),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*2) x 32 x 32
nn.Conv2d(ndf * 2, ndf * 4, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 4),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*4) x 16 x 16
nn.Conv2d(ndf * 4, ndf * 8, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 8),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*8) x 8 x 8
nn.Conv2d(ndf * 8, ndf * 16, 4, stride=2, padding=1, bias=False),
nn.BatchNorm2d(ndf * 16),
nn.LeakyReLU(0.2, inplace=True),
# state size. (ndf*16) x 4 x 4
nn.Conv2d(ndf * 16, 1, 4, stride=1, padding=0, bias=False),
nn.Sigmoid()
# state size. 1
)
However, as you can also see in that thread, it is harder to get a stable game between the generator and discriminator for this larger problem. To avoid this, I think you'll have to take a look at the improvements used here https://github.com/openai/improved-gan (paper: https://arxiv.org/abs/1606.03498). This repository includes a model for 128x128 imagenet generation.
Ahh, thank you for the extra information; that helps immensely, in addition to the intuition for possibly less stable training processes given the larger images.
So @bartolsthoorn for the images I'm using--512x512--I probably should look into the improved GAN paper and associated OpenAI implementation?
@magsol I would suggest to first try your dataset on the standard 64x64. Next you run it on 128x128, with either the extra convolution or pooling layer as listed above. After that you can try 512x512, I am no expert but I have not seen pictures that large generated by a DCGAN. You could also consider generating 128x128 images and then use a separate super-resolution network to reach 512x512.
64x64 and 128x128 are easy to try (the model includes the preprocessing, i.e. the rescaling of the images) and should be easier to generate. Did you already get good results with your data on the 64x64 scale? Please share your experience so far. :smile:
@bartolsthoorn I ran dcgan with the following arguments:
python main.py --cuda --dataset folder --dataroot /images --outf /output
I tried changing the nc = 3
value to nc = 1
since the images are all grayscale, but kept getting CUDNN_STATUS_BAD_PARAM
errors, so I left the default value unchanged.
Unfortunately after very few training iterations, it looks like the mode collapsed:

The images from the 24th epoch look like pure static:
The real images, on the other hand, look like this:
Happy to hear any suggestions you may have :) Thank you so much for your help so far! Learning a lot about GANs!
Managed to override the default image loader in torchvision so it properly pulls the images in with grayscale format and changed the nc = 1
--seems to be running nicely now :) Though the loss functions are still quickly hitting 1 and 0 respectively as before, so I'm not sure that the results of this will be any better than the last one.
No improvement, though I guess it's a little easier to see that it's not pure noise in the fake images. Still looks like static, though.
Yes, the learning is unstable. There are some new interesting suggestion in the dcgan.torch thread: https://github.com/soumith/dcgan.torch/issues/2#issuecomment-283982237
- Set ndf to ngf/4, this changes the size of the G and D models in order to balance the training
- Add white noise (this is a trick also mentioned here: ganhacks)
Any luck with the above ~tricks~ heuristics?
Ha! Isn't all of practical ML just tricks :) Haven't had a chance yet--behind on my teaching. Hoping to get back to this within the next week though, and will absolutely update when I do. On Tue, Mar 14, 2017 at 04:40 Lukas Mosser [email protected] wrote:
Any luck with the above tricks heuristics?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pytorch/examples/issues/70#issuecomment-286369535, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIQ-V7JYksn7-m-IfnFvafXVdcqfQtSks5rlmCYgaJpZM4MDwky .
-- iPhone'd
@magsol If you happen to run into difficulties training the 512x512 images, you could always scale down the images first to say 64^2 and see if it would even get you the results you'd want, then later scale up? Thanks for checking back! :)
Seems like I have one solution to this problem, after the discriminator, the output size changed with the input, when you using loss calculate the input and the target, your label size not changed, so you can let the label size changed with your input size, or you should let the discriminator output the fix size no matter what size data you input .
FYI
size_feature = self.D_A(x_A).size()
real_tensor.data.resize_(size_feature).fill_(real_label)
fake_tensor.data.resize_(size_feature).fill_(fake_label)
l_d_A_real, l_d_A_fake = bce(self.D_A(x_A), real_tensor), bce(self.D_A(x_BA), fake_tensor)
like your x_A had size [batch_size, 3, 64, 64], after the D , size_feature will be [batch_size, 1], real_label size [batch_size], But when your input x_A size changed ,like [batch_size, 3, 128, 128], after the D, the output size will be [batch_size, 25], when you calculating the loss between [batch_size, 25] and label [batch_size,] occurred the error.
Is there any way to implement a DCGAN that generates rectangular images (ie 128x32) in Pytorch? Nearly every example I've seen works with square images.
Yes, you can do that @xjdeng, you simply have to ensure that the output of your FC layer has the ratio you want from the final output. That is one way. So in your case, the dense layer output should be (batch_size, channels, 4, 1) or some multiple of that. If your network then consists of transposed convolutions that double in size each layer you would need 5 transposed conv layers to get images of size (batch_size, channels, 128, 32)
Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn 's suggestion?
DCGAN is quite old. Check the latest papers on GANs and you will find many large resolution models/examples. You need a dataset with high resolution images as well (of course).
On Thu, 21 Mar 2019 at 16:56, Chi Nok Enoch K [email protected] wrote:
Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn https://github.com/bartolsthoorn 's suggestion?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pytorch/examples/issues/70#issuecomment-475289841, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGgWk-LFMMdIzfFfUK6Fg2YBQHumo6yks5vY6vBgaJpZM4MDwky .
@bartolsthoorn thank you for the reply. I’m pretty new to GAN training, if I have downloaded art images from wiki art and they have different sizes, do I have to somehow preprocess all of them to the same size (eg. 512x512)? What about rectangular images?
I was able to get dcgan operating successfully at 128x128 by adding the convolutional layers described above and then running with ngf 128 and ndf 32. When I attempted to go to 512 I was not able to get a stable result. I'm attempting to add the white noise to the discriminator to see if that helps.
** I ended up abandoning dcgan and am now over using bmsggan which is a variation on progressive gans. Its handling higher resolutions much better **
Hi, I am also trying to implement DCGAN for grayscale image using pytorch. But i got error saying 'RuntimeError: Given groups=1, weight of size 64 1 4 4, expected input[128, 3, 64, 64] to have 1 channels, but got 3 channels instead'. I already set the number of channel as 1 but still got error. do you happen to know where can I fix the problem.
@nalinzie If you share the code here then it's maybe possible to help.
@nalinzie If you share the code here then it's maybe possible to help.
https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html i got the code from this site. The example code from this site works for the RGB image. But i am working on my own grayscale image. Therefore, I changed the number of channels nc as 1. besides that, I just keep the same. However, when i was trying to train the model, the error occured saying RuntimeError: Given groups=1, weight of size 64 1 4 4, expected input[128, 3, 64, 64] to have 1 channels, but got 3 channels instead. I dont know which part should I change to change my input to have 1 channel.
@nalinzie Make sure to check what your dataloader is outputting is also a single-channel image. https://github.com/pytorch/examples/blob/b9f3b2ebb9464959bdbf0c3ac77124a704954828/dcgan/main.py#L60
You can also do a print(X.size()) right before you put anything in and out of your model to check what dimensions your tensors are actually.
Hello, i am also working with example code and trying to get it work with smaller resolution 16x16 images, but it doesnt work with those dimensions. How i need to change generator and discriminator code for DCGAN to work with 16x16 images?
When I use the D and G code given above for 128 I am getting the following error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-279-6b5de1c111f4> in <module>
23 label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
24 # Forward pass real batch through D
---> 25 output = netD(real_cpu).view(-1)
26 # Calculate loss on all-real batch
27 errD_real = criterion(output, label)
~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
<ipython-input-276-1702f960857a> in forward(self, input)
30
31 def forward(self, input):
---> 32 return self.main(input)
~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/container.py in forward(self, input)
115 def forward(self, input):
116 for module in self:
--> 117 input = module(input)
118 return input
119
~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
421
422 def forward(self, input: Tensor) -> Tensor:
--> 423 return self._conv_forward(input, self.weight)
424
425 class Conv3d(_ConvNd):
~/metal-band-logo-generator/.ai/lib/python3.7/site-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight)
418 _pair(0), self.dilation, self.groups)
419 return F.conv2d(input, weight, self.bias, self.stride,
--> 420 self.padding, self.dilation, self.groups)
421
422 def forward(self, input: Tensor) -> Tensor:
RuntimeError: Calculated padded input size per channel: (2 x 2). Kernel size: (4 x 4). Kernel size can't be greater than actual input size
Changing the kernel size to 2 and the stride to 4 in the last Conv2D of the Discriminator seems to fix that error, but just want to make sure I'm not crazy here.
I ended up abandoning dcgan and am now over using bmsggan which is a variation on progressive gans. Its handling higher resolutions much better
is it working now EvanZ?
DCGAN is quite old. Check the latest papers on GANs and you will find many large resolution models/examples. You need a dataset with high resolution images as well (of course). … On Thu, 21 Mar 2019 at 16:56, Chi Nok Enoch K @.***> wrote: Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn https://github.com/bartolsthoorn 's suggestion? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGgWk-LFMMdIzfFfUK6Fg2YBQHumo6yks5vY6vBgaJpZM4MDwky .
DCGAN is quite old. Check the latest papers on GANs and you will find many large resolution models/examples. You need a dataset with high resolution images as well (of course). … On Thu, 21 Mar 2019 at 16:56, Chi Nok Enoch K @.***> wrote: Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn https://github.com/bartolsthoorn 's suggestion? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGgWk-LFMMdIzfFfUK6Fg2YBQHumo6yks5vY6vBgaJpZM4MDwky .
Could you name Some GANS , which could be made to work easily for any input sizes?
DCGAN is quite old. Check the latest papers on GANs and you will find many large resolution models/examples. You need a dataset with high resolution images as well (of course). … On Thu, 21 Mar 2019 at 16:56, Chi Nok Enoch K @.***> wrote: Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn https://github.com/bartolsthoorn 's suggestion? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGgWk-LFMMdIzfFfUK6Fg2YBQHumo6yks5vY6vBgaJpZM4MDwky .
DCGAN is quite old. Check the latest papers on GANs and you will find many large resolution models/examples. You need a dataset with high resolution images as well (of course). … On Thu, 21 Mar 2019 at 16:56, Chi Nok Enoch K @.***> wrote: Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn https://github.com/bartolsthoorn 's suggestion? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGgWk-LFMMdIzfFfUK6Fg2YBQHumo6yks5vY6vBgaJpZM4MDwky .
Could you name Some GANS , which could be made to work easily for any input sizes?
Hello, have you found it? Can you share it?
bmsggan
no chang7ing, I couldn't find it. If you found ,please share here.
您好,我已收到!Thank you for your answer sheet. I have received your answer sheet.