albumentations icon indicating copy to clipboard operation
albumentations copied to clipboard

Bbox results after tranformations are off

Open mikel-brostrom opened this issue 2 years ago • 11 comments

🐛 Bug

To Reproduce

I load the same image with my training and validation augmentation stacks.

My val augmentation stack:

self.val_transforms = A.Compose(
            [
                # LETTERBOX WITH ALBUMENTATIONS OPERATIONS
                # https://albumentations.ai/docs/api_reference/augmentations/geometric/resize/#albumentations.augmentations.geometric.resize.LongestMaxSize
                A.geometric.resize.LongestMaxSize(self.imgsz),
                # https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.PadIfNeeded
                A.geometric.transforms.PadIfNeeded(self.imgsz, self.imgsz, border_mode=0, value=(114, 114, 114)),
                # https://albumentations.ai/docs/api_reference/pytorch/transforms/#albumentations.pytorch.transforms.ToTensorV2
                # The numpy HWC image is converted to pytorch CHW tensor.
                ToTensorV2()
            ],
            bbox_params=A.BboxParams(format='coco', label_fields=['category_ids']),  # COCO: source format
        )

My train augmentation stack:

self.train_transforms = A.Compose(
            [
                # COLOR TRANFORMATIONS --------------
                A.augmentations.transforms.ColorJitter(p=0.1),
                A.augmentations.transforms.Sharpen(p=0.1),
                A.augmentations.transforms.ToGray(p=0.1),
                # GEOMETRICAL TRANFORMATIONS --------------
                A.augmentations.geometric.transforms.HorizontalFlip(p=0.5),  # flip image on its vertical axis
                A.augmentations.geometric.transforms.Affine(
                    translate_percent=0.1,
                    rotate=4,
                    shear=2,
                    scale=(0.9, 1.1),
                    mode=0,
                    cval=(114, 114, 114),
                    p=0.9
                ),
                A.augmentations.transforms.ImageCompression(quality_lower=50, p=0.1),
                # LETTERBOX WITH ALBUMENTATIONS OPERATIONS
                # https://albumentations.ai/docs/api_reference/augmentations/geometric/resize/#albumentations.augmentations.geometric.resize.LongestMaxSize
                A.geometric.resize.LongestMaxSize(self.imgsz),
                # https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.PadIfNeeded
                A.geometric.transforms.PadIfNeeded(self.imgsz, self.imgsz, border_mode=0, value=(114, 114, 114)),
                # https://albumentations.ai/docs/api_reference/pytorch/transforms/#albumentations.pytorch.transforms.ToTensorV2
                # The numpy HWC image is converted to pytorch CHW tensor when using this augmentation
                ToTensorV2()
            ],
            bbox_params=A.BboxParams(format='coco', label_fields=['category_ids']),  # COCO: source format
        )

Image result after going though the validation dataloader (only letterbox'ed):

proc_img_w_gt_bboxes

Image result after going though the training dataloader (color + geometrical + letterbox operations) aug_ex_3

Expected behavior

This is clearly wrong. The bboxes should be much tighter. Affine seems to be causing this. Is this a bug or am I missing something? This issue is similar to: https://github.com/albumentations-team/albumentations/issues/1373

Environment

  • Albumentations version (e.g., 0.1.8): 1.3.0
  • Python version (e.g., 3.8): 3.8
  • OS (e.g., Linux): Linux
  • How you installed albumentations (conda, pip, source): pip
  • Any other relevant information:

Additional context

mikel-brostrom avatar Jan 27 '23 10:01 mikel-brostrom

Is this on your roadmap? @onurtore, @Dipet

mikel-brostrom avatar Jan 27 '23 11:01 mikel-brostrom

Nope, not mine

onurtore avatar Jan 27 '23 11:01 onurtore

I think this is not a bug. As pointed out in #746, to fit the rotated bounding box on the targets, information about the shape of the target is needed. Setting rotate_method="ellipse" might mitigate your issue, but if you have rectangle targets, it might make too small a bounding box because the corners will be cut off. See also #1203 or https://openaccess.thecvf.com/content/ICCV2021/papers/Kalra_Towards_Rotation_Invariance_in_Object_Detection_ICCV_2021_paper.pdf

i-aki-y avatar Jan 28 '23 04:01 i-aki-y

I think this is not a bug. As pointed out in #746, to fit the rotated bounding box on the targets, information about the shape of the target is needed. Setting rotate_method="ellipse" might mitigate your issue, but if you have rectangle targets, it might make too small a bounding box because the corners will be cut off. See also #1203 or https://openaccess.thecvf.com/content/ICCV2021/papers/Kalra_Towards_Rotation_Invariance_in_Object_Detection_ICCV_2021_paper.pdf

Very helpful comment. Really appreciate it @i-aki-y . So you suggestion is to not rotate by Affine which don't have rotate_method option and instead use Rotate (which has this option)? Can shear also lead to these type of behavior or is it only rotate?

mikel-brostrom avatar Jan 28 '23 20:01 mikel-brostrom

So you suggestion is to not rotate by Affine which don't have rotate_method option and instead use Rotate (which has this option)?

Yes

Can shear also lead to these type of behavior or is it only rotate?

See below:

bbox_shear

i-aki-y avatar Jan 29 '23 07:01 i-aki-y

I see, thanks again. So, Affine is comprised of:

- Translation ("move" image on the x-/y-axis)
- Rotation
- Scaling ("zoom" in/out)
- Shear (move one side of the image, turning a square into a trapezoid)

by using ShiftScaleRotate three of these operations would be covered and it has the rotate_method option. Couldn't find any Shear operation with rotate_method... Do you have a better suggestion @i-aki-y?

mikel-brostrom avatar Jan 29 '23 07:01 mikel-brostrom

On a different note. After this conversation it is clear to me that rotate_method='ellipse' is clearly superior to 'largest_box' for most applications. Maybe it should be the standard in albumentations for handling bbox transformations?

mikel-brostrom avatar Jan 29 '23 13:01 mikel-brostrom

Another similar issue: https://github.com/albumentations-team/albumentations/issues/182

mikel-brostrom avatar Jan 29 '23 18:01 mikel-brostrom

@mikel-brostrom I think there is no easy workaround for using shear transform with ellipse rotation. As I showed above, a shear operation introduces extra spaces between the target and the bbox. So if you include shear operation in random affine transform, some extra spaces will appear. I think we need to generalize the ellipse rotation to account for the shear effect.

i-aki-y avatar Feb 02 '23 08:02 i-aki-y

@mikel-brostrom I made a PR.

i-aki-y avatar Feb 05 '23 15:02 i-aki-y

This PR might fix this issue #1394. Thank you so much @i-aki-y

Dipet avatar Feb 06 '23 11:02 Dipet

Looks like everything works.

ternaus avatar Jun 19 '24 03:06 ternaus