super-gradients icon indicating copy to clipboard operation
super-gradients copied to clipboard

Upsample size mismatch in segmentation models

Open davidtvs opened this issue 2 years ago • 3 comments
trafficstars

Describe the bug

Depending on the input image size, upsampled feature maps with nn.Upsample don't always match the size of the skip connection. This is a known issue, some reference links:

  • https://github.com/pytorch/pytorch/issues/71877
  • https://github.com/pytorch/pytorch/issues/7732

Replacing nn.Upsample with torch.nn.functional.interpolate seems to be the recommended solution.

To Reproduce

Here's a snippet using PP-LiteSeg. The dataset is cityscapes, but that's not important, the image size is the important factor. I imagine that the issue is in all models using nn.Upsample and concatenating with skip connections:

from super_gradients.training import models, dataloaders, Trainer
from super_gradients.common.object_names import Models
from super_gradients.training.metrics import IoU


trainer = Trainer(experiment_name="eval-pp-liteseg-b75")
val_loader = dataloaders.cityscapes_stdc_seg75_val(dataset_params={
    "transforms": [
            {
                "SegRescale": {
                    "long_size": 1025
                }
            }
        ]
    },
    dataloader_params={"batch_size": 1},
)
model = models.get(
    Models.PP_LITE_B_SEG75,
    pretrained_weights="cityscapes",
)
metric = IoU(num_classes=20, ignore_index=19)
miou = trainer.test(
    model=model,
    test_loader=val_loader,
    test_metrics_list=[metric],
    metrics_progress_verbose=False
)[0].cpu().item()
print(f"mIoU: {miou}")

Results in an error:

  File ".../src/super_gradients/training/models/segmentation_models/ppliteseg.py", line 52, in forward
    atten = torch.cat([*self._avg_max_spatial_reduce(x, use_concat=False), *self._avg_max_spatial_reduce(skip, use_concat=False)], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 66 but got size 65 for tensor number 2 in the list.

Expected behavior

Fully convolutional segmentation models should work for all input image sizes.

Environment:

  • Ubuntu
  • super-gradients v3.0.7
  • PyTorch 1.11

davidtvs avatar Mar 28 '23 08:03 davidtvs

Hi! Thanks for raising this issue.

TLDR: One cannot feed arbitrary-sized image to the model.

I believe the root cause of the problem is that input image has a size that is not integer divisible by a maximum stride of the backbone (32). In this case backbone produces feature maps that has size that is not a power of two.

Indeed, explicitly specifying output size for upsample operations could patch this. However, this would work only for interpolation-based upsampling and not for nn.PixelShuffle or nn.ConvTranspose2D upsampling.

We definitely will look into it, but as of now I suggest to preprocess input images to have their size that is divisible by 32.

BloodAxe avatar Apr 03 '23 13:04 BloodAxe

At least for nn.ConvTranspose2D there's output_padding to address this issue, see: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html#convtranspose2d

Looks like for nn.PixelSuffle there's no way around it though, maybe that would be a good feature request for PyTorch.

davidtvs avatar Apr 07 '23 10:04 davidtvs