doctr
doctr copied to clipboard
[interpolation] Investigate difference in resizing/rotation between cv2, TF & Pytorch
The library doesn't have clear information on the consequences of image transformation using different framework backends. Some need to be investigated:
- appearance of artefacts during interpolation with some methods
- difference in interpolation results between 3 frameworks
TF vs CV2
As mentioned here,
cv2 uses by default bilinear interpolation with half-pixel corrections, while the bilinear resizing of tf doesn't use this half-pixel correction. The way to fix that is to pass half_pixel_centers=True
in the tf.image.resize_bilinear
function.
However, this tf function is deprecated in tf 2, and the new resizing function does not provide this arg.
This pytorch issue is also mentioning this resizing arg, and it seems that we cannot use this arg in pytorch for the moment.
@fg-mindee
PIL vs TORCH
This issue also tackles this issue of anti-aliasing between torch and PIL. As mentionned in this article, the arg anti_aliasing
is now available in torchvision:
Torchvision.transforms.Resize: "The output image might be different depending on its type: when downsampling, the interpolation of PIL images and tensors is slightly different, because PIL applies antialiasing. This may lead to significant differences in the performance of a network. Therefore, it is preferable to train and serve a model with the same input types. See also below the antialias parameter, which can help making the output of PIL images and tensors closer."
This is quite a good summary, found here it seems that using
anti_alias=True
in tf and anti_aliasing
in torchvision lead to ALMOST similar results.
Alright, we almost got everything to close this issue:
- [x] Check the expected differences in interpolation between cv2, TF & PyTorch
- [x] Identify how to bring the difference down to zero (or something that can be ignored)
- [x] Assess which models/task were trained with interpolation schemes that cannot allow zero-difference with other frameworks.
- [ ] Assess the performance impact of direct porting (without retraining)
For the moment we used tf.image.resize
and torchvision.transforms.functional.resize
in all our preprocessors thus in all trainings (both reco & detection). We didn't use anti aliasing in TF nor in torch because is is disabled by default. We should use antialiasing in the future for all trainings to train our models exactly on the same pictures.
cv2.resize
is used in read_img_as_numpy
(which never performs resizing by default, so it is never called), in rotate_image
(to crop straight boxes on rotated images, which we never used but in the near future we will use it since we will have rotated segmentation maps) and in the app for visualization only.
@fg-mindee
Thanks! So, to even everything out, we would only need to enforce anti-aliasing on TF resize for now?
If so, we could:
- start doing this in future trainings
- investigate whether this severely impact previously trained models
Yes, and also in the torch resizing function as mentioned here:
antialias (bool, optional) –
antialias flag. If img is PIL Image, the flag is ignored and anti-alias is always used. If img is Tensor, the flag is False by default and can be set True for InterpolationMode.BILINEAR only mode
Let's harmonize this for the next training cycle along with the rotation in 0.5.0 :+1:
We won't have time to harmonize & retrain all models with this before the next release, so I'm staging it for 0.6.0 :+1: