doctr icon indicating copy to clipboard operation
doctr copied to clipboard

[DRAFT] feat: Added RandomPerspective augmentation

Open SiddhantBahuguna opened this issue 3 years ago • 14 comments

This PR introduces the following changes to the existing repo:

It adds Random Perspective augmentation that transforms the bboxes along with the image.

  from doctr.transforms import RandomPerspective
   import torchvision; import numpy as np; import torch
   transformed = RandomPerspective(0.2, 1, interpolation=torchvision.transforms.functional.InterpolationMode("nearest"))
   target = {"boxes": np.array([[0.1, 0.1, 0.4, 0.5] ], dtype= np.float32), "labels": np.ones(1, dtype= np.int64)}
   image, target = transformed(torch.rand((3, 224, 224)), target) 

2x3_artefact

It adds unittest for the same. The entire page is taken as a bbox to test the transformation of the image as well as the target bounding boxes

General outline

  1. A mask(with zeros) is computed with the same shape as of the original image
  2. Bboxes are drawn over the mask and filled with respective color for each label (this will be helpful to us in label identification after the transform)
  3. Both mask and original image are transformed
  4. Mask of each label is obtained by filtering the color ranges
  5. Contours of each mask are calculated and later using the extreme points, the new coordinates are obtained

Any feedback is welcome!

SiddhantBahuguna avatar Jan 10 '22 22:01 SiddhantBahuguna

Codecov Report

Merging #799 (80f9b54) into main (d1de8bf) will decrease coverage by 0.04%. The diff coverage is 92.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #799      +/-   ##
==========================================
- Coverage   96.03%   95.99%   -0.05%     
==========================================
  Files         131      131              
  Lines        4942     4991      +49     
==========================================
+ Hits         4746     4791      +45     
- Misses        196      200       +4     
Flag Coverage Δ
unittests 95.99% <92.00%> (-0.05%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
doctr/transforms/modules/pytorch.py 95.37% <92.00%> (-2.94%) :arrow_down:
doctr/transforms/modules/base.py 94.59% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d1de8bf...80f9b54. Read the comment docs.

codecov[bot] avatar Jan 10 '22 22:01 codecov[bot]

Unittest(Issues & Errors):

AttributeError: module 'torchvision.transforms.functional' has no attribute '_get_image_num_channels' There is some mismatch between the versions of torchvision as aparently _get_image_num_channels is no longer an attribute and I think is changed to get_image_num_channels. Strangely, test ran perfectly fine locally.

Also, I need to think of a better way to calculate the transformed bounding box coordinates.

SiddhantBahuguna avatar Jan 12 '22 00:01 SiddhantBahuguna

Unittest(Issues & Errors):

AttributeError: module 'torchvision.transforms.functional' has no attribute '_get_image_num_channels' There is some mismatch between the versions of torchvision as aparently _get_image_num_channels is no longer an attribute and I think is changed to get_image_num_channels. Strangely, test ran perfectly fine locally.

Also, I need to think of a better way to calculate the transformed bounding box coordinates.

I overrode the methods and it fixed the issue. Since, we arent and we havent tested for PIL image in the unittest, codcov couldnt cover all the part of the code.

SiddhantBahuguna avatar Jan 12 '22 10:01 SiddhantBahuguna

Sorry for the late reply! I'll have to review this carefully, by any chance, have you checked https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.RandomPerspective ?

I think it would be a good start to try to inherit as much as possible for there? (which will avoid cv2 deps hopefully)

fg-mindee avatar Jan 18 '22 16:01 fg-mindee

Sorry for the late reply! I'll have to review this carefully, by any chance, have you checked https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.RandomPerspective ?

I think it would be a good start to try to inherit as much as possible for there? (which will avoid cv2 deps hopefully)

Hi, I have actually inherited as much as possible from the same. The mask is what is first converted to a torch.Tensor and then back to numpy array for the exploration of contours. I just realized that we have some operations of drawing bounding boxes in torchvision, and so I will make some changes to it although we will still have some cv2 operations. Will update the current PR first with some first hand improvements. Thanks!

SiddhantBahuguna avatar Jan 18 '22 17:01 SiddhantBahuguna

Any update @SiddhantBahuguna ? :)

fg-mindee avatar Feb 18 '22 17:02 fg-mindee

@SiddhantBahuguna is this PR still good? If yes, can you review FG's comments, modify in consequences so we can potentially merge it please?

fharper avatar Apr 08 '22 18:04 fharper

@SiddhantBahuguna any updates ? :)

felixdittrich92 avatar Apr 28 '22 21:04 felixdittrich92

@SiddhantBahuguna is this PR still good? If yes, can you review FG's comments, modify in consequences so we can potentially merge it please?

Hi Fred, This PR makes one assumption that might be specific to our generated dataset. And that is why, I have to make modifications plus have to optimize it further. We can either take this PR down and I will open a new one soon or modify the existing one.. :)

SiddhantBahuguna avatar Apr 29 '22 08:04 SiddhantBahuguna

@SiddhantBahuguna any updates ? :)

Hi Felix, Thanks for your message :) I will redo few elements of it. It should be done before next week. Thanks.

SiddhantBahuguna avatar Apr 29 '22 08:04 SiddhantBahuguna

@SiddhantBahuguna: you tell me what you prefer, but for now, I'll move this PR in a draft state :)

fharper avatar Apr 29 '22 13:04 fharper

@SiddhantBahuguna same here any update ? 😅

felixdittrich92 avatar May 24 '22 18:05 felixdittrich92

To put this into context, this transformation is one of the most natural one we can find on pictures taken of documents such as recipes. So if we manage to integrate it in our trainings, I'm sure that will help a lot with robustness (also on the artefact detection part actually)

frgfm avatar May 25 '22 19:05 frgfm

Hi @SiddhantBahuguna would ask again if there are updates ? :)

felixdittrich92 avatar Sep 10 '22 18:09 felixdittrich92