GenerativeModels AutoEncoderKL output tensor dimension mismatch with Input

AutoEncoderKL output tensor dimension mismatch with Input

Open shankartmv opened this issue 7 months ago • 2 comments

I am trying to train a AutoEncoderKL model on RGB images with the following dimensions (3,1225,966). Here is the code that I use ( similar to what's there in tutorials/generative/2d_ldm/2d_ldm_tutorial.ipynb ). autoencoderkl = AutoencoderKL( spatial_dims=2, in_channels=3, out_channels=3, num_channels=(128, 256, 384), latent_channels=8, num_res_blocks=1, attention_levels=(False, False, False), with_encoder_nonlocal_attn=False, with_decoder_nonlocal_attn=False, ) autoencoderkl = autoencoderkl.to(device)

Error is reported at line 27 (Train Model - as in the tutorials notebook) recons_loss = F.l1_loss(reconstruction.float(), images.float()) RuntimeError: The size of tensor a (964) must match the size of tensor b (966) at non-singleton dimension 3

Using pytorchinfo package , I was able to print the model summary and can find the discrepancy in the upsampling layer.

================================== Layer (type:depth-idx) ================================== AutoencoderKL ├─Encoder: 1-1 │ └─ModuleList: 2-1 │ │ └─Convolution: 3-1 │ │ └─ResBlock: 3-2 │ │ └─Downsample: 3-3 │ │ └─ResBlock: 3-4 │ │ └─Downsample: 3-5 │ │ └─ResBlock: 3-6 │ │ └─GroupNorm: 3-7 │ │ └─Convolution: 3-8 ├─Convolution: 1-2 │ └─Conv2d: 2-2 ├─Convolution: 1-3 │ └─Conv2d: 2-3 ├─Convolution: 1-4 │ └─Conv2d: 2-4 ├─Decoder: 1-5 │ └─ModuleList: 2-5 │ │ └─Convolution: 3-9 │ │ └─ResBlock: 3-10 │ │ └─Upsample: 3-11 │ │ └─ResBlock: 3-12 │ │ └─Upsample: 3-13 │ │ └─ResBlock: 3-14 │ │ └─GroupNorm: 3-15 │ │ └─Convolution: 3-16 ================================== Total params: 10,954,211 Trainable params: 10,954,211 Non-trainable params: 0 Total mult-adds (Units.TERABYTES): 3.20 ================================== Input size (MB): 14.20 Forward/backward pass size Params size (MB): 43.82 Estimated Total Size (MB): 26861.59 ================================== ================================================================================= Input Shape Output Shape Param # ================================================================================= [1, 3, 1225, 966] [1, 3, 1224, 964] -- [1, 3, 1225, 966] [1, 8, 306, 241] -- -- -- -- [1, 3, 1225, 966] [1, 128, 1225, 966] 3,584 [1, 128, 1225, 966] [1, 128, 1225, 966] 295,680 [1, 128, 1225, 966] [1, 128, 612, 483] 147,584 [1, 128, 612, 483] [1, 256, 612, 483] 919,040 [1, 256, 612, 483] [1, 256, 306, 241] 590,080 [1, 256, 306, 241] [1, 384, 306, 241] 2,312,576 [1, 384, 306, 241] [1, 384, 306, 241] 768 [1, 384, 306, 241] [1, 8, 306, 241] 27,656 [1, 8, 306, 241] [1, 8, 306, 241] -- [1, 8, 306, 241] [1, 8, 306, 241] 72 [1, 8, 306, 241] [1, 8, 306, 241] -- [1, 8, 306, 241] [1, 8, 306, 241] 72 [1, 8, 306, 241] [1, 8, 306, 241] -- [1, 8, 306, 241] [1, 8, 306, 241] 72 [1, 8, 306, 241] [1, 3, 1224, 964] -- -- -- -- [1, 8, 306, 241] [1, 384, 306, 241] 28,032 [1, 384, 306, 241] [1, 384, 306, 241] 2,656,512 [1, 384, 306, 241] [1, 384, 612, 482] 1,327,488 [1, 384, 612, 482] [1, 256, 612, 482] 1,574,912 [1, 256, 612, 482] [1, 256, 1224, 964] 590,080 [1, 256, 1224, 964] [1, 128, 1224, 964] 476,288 [1, 128, 1224, 964] [1, 128, 1224, 964] 256 [1, 128, 1224, 964] [1, 3, 1224, 964] 3,459 ================================================================================= ================================================================================= (MB): 26803.57 =================================================================================

Jul 11 '24 13:07 shankartmv

GenerativeModels GenerativeModels copied to clipboard

AutoEncoderKL output tensor dimension mismatch with Input

GenerativeModels
GenerativeModels copied to clipboard