doctr
doctr copied to clipboard
[DRAFT] feat: Added RandomPerspective augmentation
This PR introduces the following changes to the existing repo:
It adds Random Perspective augmentation that transforms the bboxes along with the image.
from doctr.transforms import RandomPerspective
import torchvision; import numpy as np; import torch
transformed = RandomPerspective(0.2, 1, interpolation=torchvision.transforms.functional.InterpolationMode("nearest"))
target = {"boxes": np.array([[0.1, 0.1, 0.4, 0.5] ], dtype= np.float32), "labels": np.ones(1, dtype= np.int64)}
image, target = transformed(torch.rand((3, 224, 224)), target)
It adds unittest for the same. The entire page is taken as a bbox to test the transformation of the image as well as the target bounding boxes
General outline
- A mask(with zeros) is computed with the same shape as of the original image
- Bboxes are drawn over the mask and filled with respective color for each label (this will be helpful to us in label identification after the transform)
- Both mask and original image are transformed
- Mask of each label is obtained by filtering the color ranges
- Contours of each mask are calculated and later using the extreme points, the new coordinates are obtained
Any feedback is welcome!
Codecov Report
Merging #799 (80f9b54) into main (d1de8bf) will decrease coverage by
0.04%
. The diff coverage is92.00%
.
@@ Coverage Diff @@
## main #799 +/- ##
==========================================
- Coverage 96.03% 95.99% -0.05%
==========================================
Files 131 131
Lines 4942 4991 +49
==========================================
+ Hits 4746 4791 +45
- Misses 196 200 +4
Flag | Coverage Δ | |
---|---|---|
unittests | 95.99% <92.00%> (-0.05%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
doctr/transforms/modules/pytorch.py | 95.37% <92.00%> (-2.94%) |
:arrow_down: |
doctr/transforms/modules/base.py | 94.59% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d1de8bf...80f9b54. Read the comment docs.
Unittest(Issues & Errors):
AttributeError: module 'torchvision.transforms.functional' has no attribute '_get_image_num_channels'
There is some mismatch between the versions of torchvision as aparently _get_image_num_channels is no longer an attribute and I think is changed to get_image_num_channels. Strangely, test ran perfectly fine locally.
Also, I need to think of a better way to calculate the transformed bounding box coordinates.
Unittest(Issues & Errors):
AttributeError: module 'torchvision.transforms.functional' has no attribute '_get_image_num_channels'
There is some mismatch between the versions of torchvision as aparently _get_image_num_channels is no longer an attribute and I think is changed to get_image_num_channels. Strangely, test ran perfectly fine locally.Also, I need to think of a better way to calculate the transformed bounding box coordinates.
I overrode the methods and it fixed the issue. Since, we arent and we havent tested for PIL image in the unittest, codcov couldnt cover all the part of the code.
Sorry for the late reply! I'll have to review this carefully, by any chance, have you checked https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.RandomPerspective ?
I think it would be a good start to try to inherit as much as possible for there? (which will avoid cv2 deps hopefully)
Sorry for the late reply! I'll have to review this carefully, by any chance, have you checked https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.RandomPerspective ?
I think it would be a good start to try to inherit as much as possible for there? (which will avoid cv2 deps hopefully)
Hi, I have actually inherited as much as possible from the same. The mask is what is first converted to a torch.Tensor and then back to numpy array for the exploration of contours. I just realized that we have some operations of drawing bounding boxes in torchvision, and so I will make some changes to it although we will still have some cv2 operations. Will update the current PR first with some first hand improvements. Thanks!
Any update @SiddhantBahuguna ? :)
@SiddhantBahuguna is this PR still good? If yes, can you review FG's comments, modify in consequences so we can potentially merge it please?
@SiddhantBahuguna any updates ? :)
@SiddhantBahuguna is this PR still good? If yes, can you review FG's comments, modify in consequences so we can potentially merge it please?
Hi Fred, This PR makes one assumption that might be specific to our generated dataset. And that is why, I have to make modifications plus have to optimize it further. We can either take this PR down and I will open a new one soon or modify the existing one.. :)
@SiddhantBahuguna any updates ? :)
Hi Felix, Thanks for your message :) I will redo few elements of it. It should be done before next week. Thanks.
@SiddhantBahuguna: you tell me what you prefer, but for now, I'll move this PR in a draft state :)
@SiddhantBahuguna same here any update ? 😅
To put this into context, this transformation is one of the most natural one we can find on pictures taken of documents such as recipes. So if we manage to integrate it in our trainings, I'm sure that will help a lot with robustness (also on the artefact detection part actually)
Hi @SiddhantBahuguna would ask again if there are updates ? :)