AttGAN-PyTorch icon indicating copy to clipboard operation
AttGAN-PyTorch copied to clipboard

AttGAN RuntimeERROR:Sizes of tensors must match except in dimension 1

Open GuanlinLee opened this issue 4 years ago • 5 comments

Hi, when I use Pytorch-1.7 to train AttGAN from scratch, a RuntimeERROR appears. There are more details:

Traceback (most recent call last): File "train.py", line 161, in errD = attgan.trainD(img_a, att_a, att_a_, att_b, att_b_) File "/home/AttGAN/attgan.py", line 198, in trainD img_fake = self.G(img_a, att_b_).detach() File "/home/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/AttGAN/attgan.py", line 78, in forward return self.decode(self.encode(x), a) File "/home/AttGAN/attgan.py", line 68, in decode z = torch.cat([z, zs[len(self.dec_layers) - 2 - i]], dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Got 8 and 9 in dimension 2 (The offending index is 1)

Could you help me solve this? THX.

GuanlinLee avatar Nov 30 '20 12:11 GuanlinLee

Could you tell me what command you used to train it?

elvisyjlin avatar Dec 04 '20 03:12 elvisyjlin

CUDA_VISIBLE_DEVICES=0
python train.py
--data CelebA-HQ
--img_size 128

GuanlinLee avatar Dec 04 '20 09:12 GuanlinLee

That was an dimension mismatch error in the AttGAN generator. When decoding the latent space vector, the conditional attributes were not concateanted to the latent space vector due to their dimension 2 were not the same. The original code should work on CelebA-HQ. This error happens in many cases. For example, the numbers of layers in encoder and decoder are different, or the image size is set differently in encoder and in decoder.

Did you follow the instruction here (https://github.com/willylulu/celeba-hq-modified) to prepare the CelebA-HQ dataset? Since you specified --data CelebA-HQ in the training command. If you only downloaded the images and attributes of CelebA, please train with this option --data CelebA.

Did you change any code before training? The clean code should work for CelebA and CelebA-HQ. If you change the default arguments of any functions, error could happen. Please set any argument by the specifying them in the command instead of changing code.

When training the model, it firstly prints out the model architecture of the generator and the discriminator. An AttGAN generator for 128x128 images is supposed to look like this:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [4, 64, 64, 64]           3,072
       BatchNorm2d-2            [4, 64, 64, 64]             128
         LeakyReLU-3            [4, 64, 64, 64]               0
       Conv2dBlock-4            [4, 64, 64, 64]               0
            Conv2d-5           [4, 128, 32, 32]         131,072
       BatchNorm2d-6           [4, 128, 32, 32]             256
         LeakyReLU-7           [4, 128, 32, 32]               0
       Conv2dBlock-8           [4, 128, 32, 32]               0
            Conv2d-9           [4, 256, 16, 16]         524,288
      BatchNorm2d-10           [4, 256, 16, 16]             512
        LeakyReLU-11           [4, 256, 16, 16]               0
      Conv2dBlock-12           [4, 256, 16, 16]               0
           Conv2d-13             [4, 512, 8, 8]       2,097,152
      BatchNorm2d-14             [4, 512, 8, 8]           1,024
        LeakyReLU-15             [4, 512, 8, 8]               0
      Conv2dBlock-16             [4, 512, 8, 8]               0
           Conv2d-17            [4, 1024, 4, 4]       8,388,608
      BatchNorm2d-18            [4, 1024, 4, 4]           2,048
        LeakyReLU-19            [4, 1024, 4, 4]               0
      Conv2dBlock-20            [4, 1024, 4, 4]               0
  ConvTranspose2d-21            [4, 1024, 8, 8]      16,990,208
      BatchNorm2d-22            [4, 1024, 8, 8]           2,048
             ReLU-23            [4, 1024, 8, 8]               0
ConvTranspose2dBlock-24            [4, 1024, 8, 8]               0
  ConvTranspose2d-25           [4, 512, 16, 16]      12,582,912
      BatchNorm2d-26           [4, 512, 16, 16]           1,024
             ReLU-27           [4, 512, 16, 16]               0
ConvTranspose2dBlock-28           [4, 512, 16, 16]               0
  ConvTranspose2d-29           [4, 256, 32, 32]       2,097,152
      BatchNorm2d-30           [4, 256, 32, 32]             512
             ReLU-31           [4, 256, 32, 32]               0
ConvTranspose2dBlock-32           [4, 256, 32, 32]               0
  ConvTranspose2d-33           [4, 128, 64, 64]         524,288
      BatchNorm2d-34           [4, 128, 64, 64]             256
             ReLU-35           [4, 128, 64, 64]               0
ConvTranspose2dBlock-36           [4, 128, 64, 64]               0
  ConvTranspose2d-37           [4, 3, 128, 128]           6,147
             Tanh-38           [4, 3, 128, 128]               0
ConvTranspose2dBlock-39           [4, 3, 128, 128]               0
================================================================
Total params: 43,352,707
Trainable params: 43,352,707
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 9.75
Forward/backward pass size (MB): 186.50
Params size (MB): 165.38
Estimated Total Size (MB): 361.63
----------------------------------------------------------------

elvisyjlin avatar Dec 04 '20 11:12 elvisyjlin

I did not change any of your code. And I train STGAN and STARGAN on this dataset without any error. It seems like a new layer named Generator-40 is created in Pytorch-1.7.

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [4, 64, 64, 64]           3,072
       BatchNorm2d-2            [4, 64, 64, 64]             128
         LeakyReLU-3            [4, 64, 64, 64]               0
       Conv2dBlock-4            [4, 64, 64, 64]               0
            Conv2d-5           [4, 128, 32, 32]         131,072
       BatchNorm2d-6           [4, 128, 32, 32]             256
         LeakyReLU-7           [4, 128, 32, 32]               0
       Conv2dBlock-8           [4, 128, 32, 32]               0
            Conv2d-9           [4, 256, 16, 16]         524,288
      BatchNorm2d-10           [4, 256, 16, 16]             512
        LeakyReLU-11           [4, 256, 16, 16]               0
      Conv2dBlock-12           [4, 256, 16, 16]               0
           Conv2d-13             [4, 512, 8, 8]       2,097,152
      BatchNorm2d-14             [4, 512, 8, 8]           1,024
        LeakyReLU-15             [4, 512, 8, 8]               0
      Conv2dBlock-16             [4, 512, 8, 8]               0
           Conv2d-17            [4, 1024, 4, 4]       8,388,608
      BatchNorm2d-18            [4, 1024, 4, 4]           2,048
        LeakyReLU-19            [4, 1024, 4, 4]               0
      Conv2dBlock-20            [4, 1024, 4, 4]               0
  ConvTranspose2d-21            [4, 1024, 8, 8]      16,990,208
      BatchNorm2d-22            [4, 1024, 8, 8]           2,048
             ReLU-23            [4, 1024, 8, 8]               0
ConvTranspose2dBlock-24            [4, 1024, 8, 8]               0
  ConvTranspose2d-25           [4, 512, 16, 16]      12,689,408
      BatchNorm2d-26           [4, 512, 16, 16]           1,024
             ReLU-27           [4, 512, 16, 16]               0
ConvTranspose2dBlock-28           [4, 512, 16, 16]               0
  ConvTranspose2d-29           [4, 256, 32, 32]       2,097,152
      BatchNorm2d-30           [4, 256, 32, 32]             512
             ReLU-31           [4, 256, 32, 32]               0
ConvTranspose2dBlock-32           [4, 256, 32, 32]               0
  ConvTranspose2d-33           [4, 128, 64, 64]         524,288
      BatchNorm2d-34           [4, 128, 64, 64]             256
             ReLU-35           [4, 128, 64, 64]               0
ConvTranspose2dBlock-36           [4, 128, 64, 64]               0
  ConvTranspose2d-37           [4, 3, 128, 128]           6,147
             Tanh-38           [4, 3, 128, 128]               0
ConvTranspose2dBlock-39           [4, 3, 128, 128]               0
        Generator-40           [4, 3, 128, 128]               0
================================================================
Total params: 43,459,203
Trainable params: 43,459,203
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 9.75
Forward/backward pass size (MB): 188.00
Params size (MB): 165.78
Estimated Total Size (MB): 363.53
----------------------------------------------------------------

GuanlinLee avatar Dec 07 '20 08:12 GuanlinLee

That's interesting... Would you mind giving it a try with other PyTorch version such as 1.5.0? That's the version I tested AttGAN on CelebA dataset a few weeks ago.

elvisyjlin avatar Dec 08 '20 09:12 elvisyjlin