super-gradients Upsample size mismatch in segmentation models

trafficstars

Describe the bug

Depending on the input image size, upsampled feature maps with nn.Upsample don't always match the size of the skip connection. This is a known issue, some reference links:

https://github.com/pytorch/pytorch/issues/71877
https://github.com/pytorch/pytorch/issues/7732

Replacing nn.Upsample with torch.nn.functional.interpolate seems to be the recommended solution.

To Reproduce

Here's a snippet using PP-LiteSeg. The dataset is cityscapes, but that's not important, the image size is the important factor. I imagine that the issue is in all models using nn.Upsample and concatenating with skip connections:

from super_gradients.training import models, dataloaders, Trainer
from super_gradients.common.object_names import Models
from super_gradients.training.metrics import IoU


trainer = Trainer(experiment_name="eval-pp-liteseg-b75")
val_loader = dataloaders.cityscapes_stdc_seg75_val(dataset_params={
    "transforms": [
            {
                "SegRescale": {
                    "long_size": 1025
                }
            }
        ]
    },
    dataloader_params={"batch_size": 1},
)
model = models.get(
    Models.PP_LITE_B_SEG75,
    pretrained_weights="cityscapes",
)
metric = IoU(num_classes=20, ignore_index=19)
miou = trainer.test(
    model=model,
    test_loader=val_loader,
    test_metrics_list=[metric],
    metrics_progress_verbose=False
)[0].cpu().item()
print(f"mIoU: {miou}")

Results in an error:

  File ".../src/super_gradients/training/models/segmentation_models/ppliteseg.py", line 52, in forward
    atten = torch.cat([*self._avg_max_spatial_reduce(x, use_concat=False), *self._avg_max_spatial_reduce(skip, use_concat=False)], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 66 but got size 65 for tensor number 2 in the list.

Expected behavior

Fully convolutional segmentation models should work for all input image sizes.

Environment:

Ubuntu
super-gradients v3.0.7
PyTorch 1.11

Mar 28 '23 08:03 davidtvs

Join the discussion on DagsHub!

Mar 28 '23 08:03 dagshub[bot]

Hi! Thanks for raising this issue.

TLDR: One cannot feed arbitrary-sized image to the model.

I believe the root cause of the problem is that input image has a size that is not integer divisible by a maximum stride of the backbone (32). In this case backbone produces feature maps that has size that is not a power of two.

Indeed, explicitly specifying output size for upsample operations could patch this. However, this would work only for interpolation-based upsampling and not for nn.PixelShuffle or nn.ConvTranspose2D upsampling.

We definitely will look into it, but as of now I suggest to preprocess input images to have their size that is divisible by 32.

Apr 03 '23 13:04 BloodAxe

At least for nn.ConvTranspose2D there's output_padding to address this issue, see: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html#convtranspose2d

Looks like for nn.PixelSuffle there's no way around it though, maybe that would be a good feature request for PyTorch.

Apr 07 '23 10:04 davidtvs

super-gradients super-gradients copied to clipboard

Upsample size mismatch in segmentation models

Describe the bug

To Reproduce

Expected behavior

Environment:

super-gradients
super-gradients copied to clipboard