counterfactual-open-set icon indicating copy to clipboard operation
counterfactual-open-set copied to clipboard

Issue while training GAN

Open RamyaRaghuraman opened this issue 5 years ago • 3 comments

HI everyone,

I face the following issue when running this command for the tiny imagenet dataset:

python src/train_gan.py --epochs 10

Error: Traceback (most recent call last): File "C:/Users/RAR7ABT/pj-val-ml/pjval_ml/OSR/counterfactual/src/train_gan.py", line 32, in train_gan(networks, optimizers, dataloader, epoch=epoch, **options) File "C:\Users\RAR7ABT\pj-val-ml\pjval_ml\OSR\counterfactual\src\training.py", line 67, in train_gan logits = netD(images)[:,0] File "C:\Users\RAR7ABT\AppData\Local\conda\conda\envs\pjval\lib\site-packages\torch\nn\modules\module.py", line 493, in call result = self.forward(*input, **kwargs) File "C:\Users\RAR7ABT\pj-val-ml\pjval_ml\OSR\counterfactual\src\network_definitions.py", line 275, in forward x = self.fc1(x) File "C:\Users\RAR7ABT\AppData\Local\conda\conda\envs\pjval\lib\site-packages\torch\nn\modules\module.py", line 493, in call result = self.forward(*input, **kwargs) File "C:\Users\RAR7ABT\AppData\Local\conda\conda\envs\pjval\lib\site-packages\torch\nn\modules\linear.py", line 92, in forward return F.linear(input, self.weight, self.bias) File "C:\Users\RAR7ABT\AppData\Local\conda\conda\envs\pjval\lib\site-packages\torch\nn\functional.py", line 1406, in linear ret = torch.addmm(bias, input, weight.t()) RuntimeError: size mismatch, m1: [64 x 16384], m2: [4096 x 20] at C:/w/1/s/tmp_conda_3.6_041836/conda/conda-bld/pytorch_1556684464974/work/aten/src\THC/generic/THCTensorMathBlas.cu:268

Process finished with exit code 1

Any help would be really appreciated. Thanks in advance!

RamyaRaghuraman avatar Aug 02 '19 08:08 RamyaRaghuraman

@lwneal the error seems to come from x = self.fc1(x) from class multiclassDiscriminator32(nn.Module). The size of m1 must be m1: [64 x 4096 ] but I somehow end up with m1: [64 x 16384]

Please do take a look at the discriminator updates in training.py @lwneal @mattolson93

RamyaRaghuraman avatar Aug 09 '19 07:08 RamyaRaghuraman

@RamyaRaghuraman, I ran into a similar issue when training only the baseline classifier; for me, the error was because I had not resized the input images from 64 x 64 to 32x 32. If this resizing doesn't happen, the size of the network output ends up 4x bigger...which would explain why your output size is 16384 instead of the desired 4096 since 16384 = 4 * 4096.

KevLuo avatar Sep 12 '20 00:09 KevLuo

@KevLuo Hello sir, I encountered the same problem. I see the ImageConverter could resize img to 32x32,

# Crops, resizes, normalizes, performs any desired augmentations
# Outputs images as eg. 32x32x3 np.array or eg. 3x32x32 torch.FloatTensor

but it looks like it didn't. So, do we need to re-write the converter to make a resize transform?

77flyy avatar Nov 22 '23 13:11 77flyy