vision icon indicating copy to clipboard operation
vision copied to clipboard

resize with pad transformation

Open oekosheri opened this issue 2 years ago • 16 comments

🚀 The feature

In tensorflow tf.image has a method, tf.image.resize_with_pad, that pads and resizes if the aspect ratio of input and output images are different to avoid distortion. I couldn't find an equivalent in torch transformations and had to write it myself. I think it would be a useful feature to have.

Motivation, pitch

When moving to pytoch from Tensorflow, one does not want to lose handy features!

Alternatives

No response

Additional context

No response

cc @vfdev-5 @datumbox

oekosheri avatar Jul 05 '22 06:07 oekosheri

Hi @oekosheri , you can check following function, it will do bottom-right padding mode

https://github.com/pytorch/vision/blob/d6e39ff76c82c7510f68a7aa637f015e7a86f217/torchvision/models/detection/transform.py#L25-L71

And I wrote a similar letterboxing mode at belows:

https://github.com/zhiqwang/yolov5-rt-stack/blob/main/yolort/models/transform.py#L65-L109

zhiqwang avatar Jul 05 '22 07:07 zhiqwang

Hi @zhiqwang, Thanks! you mean I can use "TORCH.NN.FUNCTIONAL.INTERPOLATE" ? I tried it on a an image tensor now and it constantly gives value error of input/output size not matching. Also, this is pretty hidden. Why not add a simple wrapper to resize that does padding when aspect ratio can't be preserved?

oekosheri avatar Jul 05 '22 08:07 oekosheri

Also, this is pretty hidden. Why not add a simple wrapper to resize that does padding when aspect ratio can't be preserved?

Let's invite @datumbox to this disscusion, and hear his viewpoint on this problem.

zhiqwang avatar Jul 05 '22 08:07 zhiqwang

Just FYI, a previous issue #3286 also has some relevance to the discussion here.

zhiqwang avatar Jul 06 '22 05:07 zhiqwang

@oekosheri Thanks for the proposal.

I would like to understand more about the use-case. Why can't we just use the resize in combination with pad? It should be 2 relatively straightforward calls. Maintaining TorchVision is a balancing act between providing the necessary primitives for people to build upon it and avoid bloating the library. A good reason to add a functionality is if it's very popular or there are specific tricky corner-cases that need to be handled carefully. Is this the case here?

@zhiqwang I wouldn't recommend using the method from detection as it's private and might change on the near future. Though you are right to say that the specific detection transforms file does what @oekosheri wants to do (resize and then batch + pad), the code does too many things and is very coupled to the logic of Detection. We've started moving some of this logic at the references and on the near future we plan to start porting them in to main TorchVision. @vfdev-5 is currently working on the prototype transforms to finalize the API.

datumbox avatar Jul 06 '22 08:07 datumbox

Hi @datumbox , imagine you have input images from different sources with different sizes and aspect ratios. You want to transform them all to one final size without distortion. If you separate out pad and resize, you need to manually apply different transforms to different images. However, when you have one transform applied to all inputs, in it you can check whether or not to pad and how to pad. An example code would sth like this:

import torchvision.transforms.functional as F


class Resize_with_pad:
    def __init__(self, w=1024, h=768):
        self.w = w
        self.h = h

    def __call__(self, image):

        w_1, h_1 = image.size
        ratio_f = self.w / self.h
        ratio_1 = w_1 / h_1


        # check if the original and final aspect ratios are the same within a margin
        if round(ratio_1, 2) != round(ratio_f, 2):

            # padding to preserve aspect ratio
            hp = int(w_1/ratio_f - h_1)
            wp = int(ratio_f * h_1 - w_1)
            if hp > 0 and wp < 0:
                hp = hp // 2
                image = F.pad(image, (0, hp, 0, hp), 0, "constant")
                return F.resize(image, [self.h, self.w])

            elif hp < 0 and wp > 0:
                wp = wp // 2
                image = F.pad(image, (wp, 0, wp, 0), 0, "constant")
                return F.resize(image, [self.h, self.w])

        else:
            return F.resize(image, [self.h, self.w])

oekosheri avatar Jul 06 '22 09:07 oekosheri

@oekosheri I understand this is strongly motivated for the Detection use-case where things need to be resized to a maximum size proportionally and then padded to ensure we can produce batches, right?

datumbox avatar Jul 06 '22 09:07 datumbox

@datumbox They are padded to ensure images that have different original aspect ratio to the final one, don't get distorted. Distorted images may not work well with CNNs. I updated a mistake in the code above. As it is now, it produces the exact output that tf.image.resize_with_pad does.

oekosheri avatar Jul 06 '22 12:07 oekosheri

@oekosheri Thanks for the references and context. I'll sync with @vfdev-5 offline to see if we can add this on the new API and how. I'll leave the issue open to ensure it stays on our radar.

datumbox avatar Jul 06 '22 16:07 datumbox

Yes, I also think that such transformation would be very useful. I also had cases when images of different resolutions and aspect ratios, but when cropping images, I could lose pieces important for classification (this was a classification of defects and they could be on the edge of the image) and I would like to maintain the aspect ratio in order to avoid strong distortions. So I had to use a combination of LongestMaxSize and PadIfNeeded from the Albumentations library. I would like something similar, you can implement it as suggested here in the form of one transformation.

Inkorak avatar Jul 09 '22 16:07 Inkorak

I Strongly second this feature. It is a very important transformation to have. I almost always have to use hacks to resolve this when working with images.

H-Sorkatti avatar Aug 07 '22 07:08 H-Sorkatti

ditto, it'd be really handy to have one. Our team also uses such a feature. We currently implemented a custom version with albumentations that only works for numpy (not torch tensor). And we are looking for an alternative that works with torch tensor and can be converted/embedded into an onnx graph via torch.onnx.export.

curious, do you think the torchvision team could implement it soon? @datumbox @zhiqwang

AsiaCao avatar Dec 05 '22 03:12 AsiaCao

@AsiaCao Thanks for the input. Right now we are focusing on finalizing the Transforms V2 API. Once we complete the work on that front, we can review this request and see what's the best way forwards.

datumbox avatar Dec 05 '22 09:12 datumbox

thanks @datumbox

AsiaCao avatar Dec 06 '22 03:12 AsiaCao

Any plans for this to be implemented now? This would be convenient to have. Thanks!

swap-10 avatar Jun 30 '23 11:06 swap-10

Any update to this?

amanikiruga avatar Sep 24 '23 07:09 amanikiruga