vit-pytorch icon indicating copy to clipboard operation
vit-pytorch copied to clipboard

Vit MAE reconstruction size mismatch

Open RhinigtasSalvex opened this issue 3 years ago • 2 comments

I'm trying to Train ViT with Masked Autoencoder training but I'm getting an error when running MAE.forward() The tensor size of the predicted pixel values is of by a factor of 4 in comparison to the masked_patches tensor in the MSE_loss call.

RuntimeError: The size of tensor a (1024) must match the size of tensor b (4096) at non-singleton dimension 2

I've tried different settings but the factor 4 size mismatch stays.

I've also tried a hack to fix the predicted pixel values size by adding a factor 4 to the to_pixels output layer neuron count. This fixes the problem in the MSE_loss call but introduces a new one, namely: The gradients don't match up in the backward call.

RuntimeError: Function MmBackward returned an invalid gradient at index 1 - got [4096, 1024] but expected shape compatible with [1024, 1024]

But now I don't know how to debug further.

my last settings where:

'model': { 'encoder_depth': 5, 'decoder_depth': 5, 'patch_size': 32, 'num_classes': 1000, 'channels': 1, 'dim': 1024, 'heads': 8, 'mlp_dim': 2048, 'masking_ratio': 0.75, 'decoder_dim': 512, },

RhinigtasSalvex avatar Dec 30 '21 21:12 RhinigtasSalvex

Hi Rhinigtas! Could you show what your full training script looks like? Perhaps I can spot the error more easily that way

lucidrains avatar Dec 31 '21 19:12 lucidrains

Hi Lucidrains, I've uploaded a stripped down version of my training script.

vit_train_tmp.txt

RhinigtasSalvex avatar Jan 04 '22 17:01 RhinigtasSalvex