albumentations Add CutAndPaste

About PR

This PR tries to implement Cut And Paste Augmentation "Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation". It includes multiple image blending methods given in "Poisson Image Editing".

Blending Demo

The A.paste function supports the following four image blending method.

GAUSSIAN: Used in "Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation".
NORMAL_CLONE, MIXED_CLONE, MONOCHROME_TRANSFER: Different types of seamless cloning are explained in "Poisson Image Editing".

BaseAndPasteImages BlendComparison

Figure: The result of different methods: GAUSSIAN, NORMAL_CLONE, MIXED_CLONE, MONOCHROME_TRANSFER (Left to right)

CutAndPaste Transform Demo

This is an example of the CutAndPaste Transform usage.

transform = A.Compose(
    [
        A.CutAndPaste(
            p=1.0, 
            paste_image_dir=object_dir,
            get_label_from_path=get_label_from_path,
            num_object_limit=(1, 3),
            blend_method="GAUSSIAN"
        )
    ],
    bbox_params=A.BboxParams(format='coco', label_fields=["labels"])
)

It requires paste_image_dir parameters, which include objects to be pasted. The transform randomly selects multiple files (objects) from the paste_image_dir directory. So in this version, the user needs to prepare images so that each image includes a single object.

The transform also requires a function parameter get_label_from_path, which returns label information from a given object file path. Since the transform could not know the label information of randomly selected objects, the get_label_from_path is used to extract label information from the object file path. This means that the user should include label information in the object path. And provide the function to extract the information from the path.

The followings are the results of different trials with a fixed base image. RandomExamples_no_annots

RandomExamples_with_annots

Sample Notebook

You can reproduce the same result in this notebook on Colab: https://colab.research.google.com/drive/1sFCAhS8FTyp7dLIdJgUJwbB5JnGBYoq3

Note and Limitation

1. The user needs to prepare object files as RGBA PNG images.

As described above, the user needs to prepare object files (images to be pasted) in advance.

2. The object file path should include label information

Since the transform identifies the label information from the object path, the user needs to include label information in the object path and provide the function get_label_from_pathas a parameter, which extracts label information from the path. For example, when the object path is like a path/to/object/{object_id}_{label_id}.png, a get_label_from_path could be:

get_label_from_path = lambda image_path: int(image_path.stem.split("_")[-1])

3. Returned masks become binary masks

Even if the input masks are non-binary masks, the transformed masks are binary masks. (The values are 0 or 1) I think I can remove this limitation with extra work.

About Implementation

1. A new rotation method, `rotate_bound` is added.

A new rotation function, rotate_bound is introduced to rotate object images. While the standard rotate function causes unwanted crops, the rotate_bound expands the input's shape depending on the rotation. See the following examples.

CompareRotation

2. Augmentation to the masks are actually done inside the `get_params_dependent_on_targets`

Augmented masks are needed to calculate the bbox augmentation. But I could not find a better place where both the bboxes and masks are accessible except for get_params_dependent_on_targets. So I implement the masks augmentation in the get_params_dependent_on_targets

2022-10-18: Update description and examples.

Sep 23 '22 09:09 i-aki-y

Most of the work has been completed. @Dipet @ternaus I would appreciate it if you reviewed this.

Oct 19 '22 05:10 i-aki-y

Sorry for the additional correction. The current implementation assumes that the number of all objects in an image is less than 255("the length of target masks" + "the number of pasted objects" < 255). So, I added some validation code and documentation about the limitation. I think the number 255 is sufficiently large for a typical usecase, but if needed, it can be increased.

Nov 01 '22 04:11 i-aki-y

albumentations albumentations copied to clipboard

Add CutAndPaste

About PR

Blending Demo

CutAndPaste Transform Demo

Sample Notebook

Note and Limitation

1. The user needs to prepare object files as RGBA PNG images.

2. The object file path should include label information

3. Returned masks become binary masks

About Implementation

1. A new rotation method, rotate_bound is added.

2. Augmentation to the masks are actually done inside the get_params_dependent_on_targets

albumentations
albumentations copied to clipboard

1. A new rotation method, `rotate_bound` is added.

2. Augmentation to the masks are actually done inside the `get_params_dependent_on_targets`