pytorch-image-classification icon indicating copy to clipboard operation
pytorch-image-classification copied to clipboard

Question about data augmentation in your ResNet image classifier example

Open jossgillet opened this issue 4 years ago • 2 comments

Hello Ben - how can we remove the data augmentation step in your ResNet example? I need to pass the entire image in the training (no cropping).

I tried to modify the variables train_transforms and test_transforms to remove the rotation, horizontal flip and cropping, thus keeping only .Resize(), .ToTensor() and Normalize() in these variables. So the only thing I've modified in your script is:

train_transforms = transforms.Compose([
                           transforms.Resize(pretrained_size),
                           transforms.ToTensor(),
                           transforms.Normalize(mean = pretrained_means, 
                                                std = pretrained_stds)
                       ])

test_transforms = transforms.Compose([
                           transforms.Resize(pretrained_size),
                           transforms.ToTensor(),
                           transforms.Normalize(mean = pretrained_means, 
                                                std = pretrained_stds)
                       ])

But then when triggering the training loop, I get this error message:

invalid argument 0: Sizes of tensors must match except in dimension 0. Got 630 and 513 in dimension 3 at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensor.cpp:612

Any idea how to fix that?

Many thanks

jossgillet avatar Jan 27 '21 17:01 jossgillet

The reason for this is that if the argument to transforms.Resize is an integer then it rescales the image to only make the shorter edge pretrained_size. The longer edge will be scaled but as the sizes of the images in the batch will be different so will the size of this longer edge, and thus your images can't be fixed together.

If you really don't want to augment, then the fix is to change the resize to transforms.Resize((pretrained_size, pretrained_size)). The downside of this is if you have non-square images then the longer dimension will be very squished and the images will be quite distorted, potentially meaning the images are so unrecognizable that your model won't be able to effectively classify them.

Why do you not want to crop your images?

bentrevett avatar Jan 31 '21 22:01 bentrevett

Many thanks Ben for your reply - it is clear now. I'm trying to avoid the cropping step in the augmentation since I need the model to see the full image. Some differences between images are often present on the edges of the images, and by center-cropping them, the model might not understand what constitutes each class. So I'd like to ensure it learns from the entire image and not a cropped zone.

Hope that makes sense.

jossgillet avatar Feb 01 '21 08:02 jossgillet