doctr [interpolation] Investigate difference in resizing/rotation between cv2, TF & Pytorch

The library doesn't have clear information on the consequences of image transformation using different framework backends. Some need to be investigated:

appearance of artefacts during interpolation with some methods
difference in interpolation results between 3 frameworks

Aug 09 '21 09:08 fg-mindee

TF vs CV2 As mentioned here, cv2 uses by default bilinear interpolation with half-pixel corrections, while the bilinear resizing of tf doesn't use this half-pixel correction. The way to fix that is to pass half_pixel_centers=True in the tf.image.resize_bilinear function. However, this tf function is deprecated in tf 2, and the new resizing function does not provide this arg. This pytorch issue is also mentioning this resizing arg, and it seems that we cannot use this arg in pytorch for the moment. @fg-mindee

Aug 30 '21 13:08 charlesmindee

PIL vs TORCH This issue also tackles this issue of anti-aliasing between torch and PIL. As mentionned in this article, the arg anti_aliasing is now available in torchvision:

Torchvision.transforms.Resize: "The output image might be different depending on its type: when downsampling, the interpolation of PIL images and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences in the performance of a network. Therefore, it is preferable to train and serve a model with the same input types. See also below the antialias parameter, which can help making the output of PIL images and tensors closer."

Aug 30 '21 13:08 charlesmindee

This is quite a good summary, found here it seems that using anti_alias=True in tf and anti_aliasing in torchvision lead to ALMOST similar results.

Aug 30 '21 14:08 charlesmindee

Alright, we almost got everything to close this issue:

[x] Check the expected differences in interpolation between cv2, TF & PyTorch
[x] Identify how to bring the difference down to zero (or something that can be ignored)
[x] Assess which models/task were trained with interpolation schemes that cannot allow zero-difference with other frameworks.
[ ] Assess the performance impact of direct porting (without retraining)

Sep 03 '21 10:09 fg-mindee

For the moment we used tf.image.resize and torchvision.transforms.functional.resize in all our preprocessors thus in all trainings (both reco & detection). We didn't use anti aliasing in TF nor in torch because is is disabled by default. We should use antialiasing in the future for all trainings to train our models exactly on the same pictures.

cv2.resize is used in read_img_as_numpy (which never performs resizing by default, so it is never called), in rotate_image (to crop straight boxes on rotated images, which we never used but in the near future we will use it since we will have rotated segmentation maps) and in the app for visualization only.

@fg-mindee

Oct 20 '21 13:10 charlesmindee

Thanks! So, to even everything out, we would only need to enforce anti-aliasing on TF resize for now?

If so, we could:

start doing this in future trainings
investigate whether this severely impact previously trained models

Oct 21 '21 08:10 fg-mindee

Yes, and also in the torch resizing function as mentioned here:

antialias (bool, optional) –
antialias flag. If img is PIL Image, the flag is ignored and anti-alias is always used. If img is Tensor, the flag is False by default and can be set True for InterpolationMode.BILINEAR only mode

Oct 21 '21 08:10 charlesmindee

Let's harmonize this for the next training cycle along with the rotation in 0.5.0 :+1:

Oct 25 '21 08:10 fg-mindee

We won't have time to harmonize & retrain all models with this before the next release, so I'm staging it for 0.6.0 :+1:

Dec 30 '21 15:12 fg-mindee

doctr doctr copied to clipboard

[interpolation] Investigate difference in resizing/rotation between cv2, TF & Pytorch

doctr
doctr copied to clipboard