BMSG-GAN icon indicating copy to clipboard operation
BMSG-GAN copied to clipboard

I have met this error when run train.py ...

Open bemoregt opened this issue 6 years ago • 20 comments

Hi, @owang @sridharmahadevan @akanimax @huangzh13

I have met this error when run train.py ... What's wrong to me?

oem@sgi:~/BMSG-GAN/sourcecode$ python3 train.py --depth=7 --latent_size=128 --images_dir='../data/celebJapan/train' --sample_dir=samples/exp_2 --model_dir=models/exp_2 Total number of images in the dataset: 6604

error message - Starting the training process ...

Epoch: 1 Elapsed [0:00:04.581270] batch: 1 d_loss: 4.346926 g_loss: 6.674685 Traceback (most recent call last): File "train.py", line 254, in main(parse_arguments()) File "train.py", line 248, in main start=args.start File "/home/oem/BMSG-GAN/sourcecode/MSG_GAN/GAN.py", line 482, in train gen_img_files) File "/home/oem/BMSG-GAN/sourcecode/MSG_GAN/GAN.py", line 345, in create_grid samples = [Generator.adjust_dynamic_range(sample) for sample in samples] File "/home/oem/BMSG-GAN/sourcecode/MSG_GAN/GAN.py", line 345, in samples = [Generator.adjust_dynamic_range(sample) for sample in samples] File "/home/oem/BMSG-GAN/sourcecode/MSG_GAN/GAN.py", line 96, in adjust_dynamic_range data = data * scale + bias TypeError: mul() received an invalid combination of arguments - got (numpy.float32), but expected one of:

  • (Tensor other) didn't match because some of the arguments have invalid types: (numpy.float32)
  • (Number other) didn't match because some of the arguments have invalid types: (numpy.float32)

Thanks in advance ~

bemoregt avatar Apr 23 '19 08:04 bemoregt

Could you please show what is the version of your python, torch and numpy? Please try updating to the latest versions for torch and numpy. The code is tested for python == 3.6.5. Please let me know if you still face this issue.

akanimax avatar Apr 23 '19 08:04 akanimax

Hi, @owang @sridharmahadevan @akanimax @huangzh13

My Environment:

Ubuntu 17.x x64, Python 3.6.7, CUDA 10.1, Pytorch 0.4.1, numpy 1.15.4

Thanks.

bemoregt avatar Apr 23 '19 08:04 bemoregt

Could you please try again with python 3.6? The error comes after the first training log itself.

akanimax avatar Apr 23 '19 08:04 akanimax

Hi, @owang @sridharmahadevan @akanimax @huangzh13

It's same at python3.6 ...

What's wrong to me?

Thanks at any rate .... _;

bemoregt avatar Apr 23 '19 09:04 bemoregt

Could you try updating pytorch to 1.0.0? I hope this solves the problem.

akanimax avatar Apr 23 '19 09:04 akanimax

OK, I'll try that...

bemoregt avatar Apr 23 '19 09:04 bemoregt

It works , Thanks a lot.

from @bemoregt

bemoregt avatar Apr 23 '19 12:04 bemoregt

@bemoregt,

I am glad that it is working now. Just wanted to point out that since you are synthesizing Japanese celebs at 256 x 256 resolution, the latent_size = 128 might not be enough to make the generator expressive enough. Please try to use latent_size=512.

Also, if you are able to get good results, please feel free to share these with us, I'll be happy to include them on the readme like @huangzh13's cartoons :smile:.

Hope this helps.

:+1: Best regards, @akanimax

akanimax avatar Apr 23 '19 12:04 akanimax

But, ...

Elapsed [0:04:07.511359] batch: 108 d_loss: 0.040370 g_loss: 18.472263 Elapsed [0:04:15.999767] batch: 112 d_loss: 0.000000 g_loss: 12.169998 Elapsed [0:04:24.425038] batch: 116 d_loss: 0.053961 g_loss: 16.491339 Elapsed [0:04:32.862795] batch: 120 d_loss: 0.000000 g_loss: 11.238050 Traceback (most recent call last): File "train.py", line 254, in main(parse_arguments()) File "train.py", line 248, in main start=args.start File "/home/oem/BMSG-GAN/sourcecode/MSG_GAN/GAN.py", line 417, in train for (i, batch) in enumerate(data, 1): File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 637, in next return self._process_next_batch(batch) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch raise batch.exc_type(batch.exc_msg) RuntimeError: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 209, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 3 and 1 in dimension 1 at /pytorch/aten/src/TH/generic/THTensorMoreMath.cpp:1307

another error happens ..

bemoregt avatar Apr 23 '19 12:04 bemoregt

@bemoregt,

I see. There is no handling of Grayscale image case. I'll fix this by tomorrow when I get access to my code (I am currently travelling). For now, could you please remove all the grayscale (black and white) images from your dataset?

Thanks. @akanimax

akanimax avatar Apr 23 '19 13:04 akanimax

Hi, @akanimax

OK, I see.

I could understand my data's problems...

My images include some rotated & zero-padded images.

Because of those images, May be It happens...

Many Thanks ~

bemoregt avatar Apr 23 '19 13:04 bemoregt

@akanimax

At current sample state epoch=227

https://3.bp.blogspot.com/-KY44bqw_nd8/XMD7JNG-6SI/AAAAAAABAkY/lI9VEv8nhWw4xbMFh4RI8tb8nhkjZuImACLcBGAs/s1600/epoch227.png

https://3.bp.blogspot.com/-KY44bqw_nd8/XMD7JNG-6SI/AAAAAAABAkY/lI9VEv8nhWw4xbMFh4RI8tb8nhkjZuImACLcBGAs/s1600/epoch227.png

bemoregt avatar Apr 25 '19 00:04 bemoregt

Hi, @akanimax

celebJapan, epoch=230.., TitanXP + 1080ti

[image: epoch227.png]

Thanks ..

from @bemoregt.

2019년 4월 23일 (화) 오후 6:06, Animesh Karnewar [email protected]님이 작성:

Could you try updating pytorch to 1.0.0? I hope this solves the problem.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/akanimax/BMSG-GAN/issues/5#issuecomment-485716655, or mute the thread https://github.com/notifications/unsubscribe-auth/AEUCIZBH7TADL274J3TWE73PR3GRDANCNFSM4HHV3VGA .

bemoregt avatar Apr 25 '19 00:04 bemoregt

Hi, @bemoregt Could you tell me something about your celebJapan dataset?

Best regards.

huangzh13 avatar Apr 25 '19 02:04 huangzh13

Hi, @huangzh13 @akanimax

Ok, My celebJapan dataset's information is..

  • Total 3,591 images, including Horizontal Flipped Augmented.
  • 100x100 sized Color Image
  • about 20KB file size
  • jpg+png format.

Is this too small dataset for MSG-GAN?

Thanks.

bemoregt avatar Apr 25 '19 03:04 bemoregt

@bemoregt, The results seem good to me given the size of your dataset. BTW, could you share a full size sheet of the generated images. The one you shared seems to be a screenshot of the image viewer. I think you should let it train for longer and one more thing you could try is to calculate the FID of the models for an objective evalutaion. The data size ok for the resolution. Also try increasing the latent size. Hope this helps.

Best regards, @akanimax

akanimax avatar Apr 25 '19 04:04 akanimax

Hi, @akanimax @huangzh13 @owang @sridharmahadevan

It seems that rotated face is very weak for generation using MSG-GAN.

What is the image augmentation technics suitable for face generating GAN?

Thanks .

from @bemoregt

bemoregt avatar Apr 25 '19 09:04 bemoregt

@bemoregt,

I see. There is no handling of Grayscale image case. I'll fix this by tomorrow when I get access to my code (I am currently travelling). For now, could you please remove all the grayscale (black and white) images from your dataset?

Thanks. @akanimax

Hi, @akanimax

I'd be happy to test MSG-GAN on radiology data.

Is there a way to allow for output grayscale images in your next update?

Thanks!

Pascal900 avatar Apr 30 '19 13:04 Pascal900

@Pascal900,

Great to hear that you would like to use the MSG-GAN for radiology data. Earlier when I said that I'll handle the Grayscale case, I meant just ignoring the grayscale images from the dataset. But for your case, it seems that all the images in the dataset would be grayscale. Will create a new branch for this development. It is a new addition to the network. Till then one thing you could try is to make RGB images from your gray-scale ones. The network will just learn to output the same values for the R, G and B channels. I have tried it before on MNIST data, it worked pretty well.

Please feel free to ask if you have any more queries.

Best regards, @akanimax

akanimax avatar May 02 '19 04:05 akanimax

Since I am also working on grayscale radiology data and needed support for that immediately, I've implemented this in #14. @Pascal900, maybe you can try my branch if this use case is still relevant to you. I'd be happy to hear feedback.

mdraw avatar May 31 '19 22:05 mdraw

Hi @mdraw @akanimax, thank you for the great work. @mdraw, I tried your code branch on grayscale radiology data, it worked well. However, I got an error while generating synthetic samples from saved weights with image channels = 1. It seems to be an image dimension problem. However, it works fine with RGB images with channels = 3. Could you please help me to guide about where could I make changes in the generate_samples.py code? Please see error log as follows:

Traceback (most recent call last): File "generate_samples.py", line 138, in main(parse_arguments()) File "generate_samples.py", line 107, in main th.load(args.generator_file) File "/home/r00206978/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1498, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for DataParallel: size mismatch for module.rgb_converters.0.weight: copying a param with shape torch.Size([1, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 512, 1, 1]). size mismatch for module.rgb_converters.0.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([3]). size mismatch for module.rgb_converters.1.weight: copying a param with shape torch.Size([1, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 512, 1, 1]). size mismatch for module.rgb_converters.1.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([3]). size mismatch for module.rgb_converters.2.weight: copying a param with shape torch.Size([1, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 512, 1, 1]). size mismatch for module.rgb_converters.2.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([3]). size mismatch for module.rgb_converters.3.weight: copying a param with shape torch.Size([1, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 512, 1, 1]). size mismatch for module.rgb_converters.3.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([3]). size mismatch for module.rgb_converters.4.weight: copying a param with shape torch.Size([1, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 256, 1, 1]). size mismatch for module.rgb_converters.4.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([3]). size mismatch for module.rgb_converters.5.weight: copying a param with shape torch.Size([1, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 128, 1, 1]). size mismatch for module.rgb_converters.5.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([3]). size mismatch for module.rgb_converters.6.weight: copying a param with shape torch.Size([1, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([3, 64, 1, 1]). size mismatch for module.rgb_converters.6.bias: copying a param with shape torch.Size([1]) from checkpoint, the shape in current model is torch.Size([3]).

mmuneebsaad avatar Aug 31 '22 00:08 mmuneebsaad