albumentations
albumentations copied to clipboard
Add CutAndPaste
About PR
This PR tries to implement Cut And Paste Augmentation "Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation". It includes multiple image blending methods given in "Poisson Image Editing".
See also: #1225
Blending Demo
The A.paste function supports the following four image blending method.
- GAUSSIAN: Used in "Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation".
- NORMAL_CLONE, MIXED_CLONE, MONOCHROME_TRANSFER: Different types of seamless cloning are explained in "Poisson Image Editing".

Figure: The result of different methods: GAUSSIAN, NORMAL_CLONE, MIXED_CLONE, MONOCHROME_TRANSFER (Left to right)
CutAndPaste Transform Demo
This is an example of the CutAndPaste Transform usage.
transform = A.Compose(
[
A.CutAndPaste(
p=1.0,
paste_image_dir=object_dir,
get_label_from_path=get_label_from_path,
num_object_limit=(1, 3),
blend_method="GAUSSIAN"
)
],
bbox_params=A.BboxParams(format='coco', label_fields=["labels"])
)
It requires paste_image_dir parameters, which include objects to be pasted. The transform randomly selects multiple files (objects) from the paste_image_dir directory. So in this version, the user needs to prepare images so that each image includes a single object.
The transform also requires a function parameter get_label_from_path, which returns label information from a given object file path. Since the transform could not know the label information of randomly selected objects, the get_label_from_path is used to extract label information from the object file path. This means that the user should include label information in the object path. And provide the function to extract the information from the path.
The followings are the results of different trials with a fixed base image.


Sample Notebook
You can reproduce the same result in this notebook on Colab: https://colab.research.google.com/drive/1sFCAhS8FTyp7dLIdJgUJwbB5JnGBYoq3
Note and Limitation
1. The user needs to prepare object files as RGBA PNG images.
As described above, the user needs to prepare object files (images to be pasted) in advance.
2. The object file path should include label information
Since the transform identifies the label information from the object path, the user needs to include label information in the object path and provide the function get_label_from_pathas a parameter, which extracts label information from the path.
For example, when the object path is like a path/to/object/{object_id}_{label_id}.png, a get_label_from_path could be:
get_label_from_path = lambda image_path: int(image_path.stem.split("_")[-1])
3. Returned masks become binary masks
Even if the input masks are non-binary masks, the transformed masks are binary masks. (The values are 0 or 1) I think I can remove this limitation with extra work.
About Implementation
1. A new rotation method, rotate_bound is added.
A new rotation function, rotate_bound is introduced to rotate object images.
While the standard rotate function causes unwanted crops, the rotate_bound expands the input's shape depending on the rotation. See the following examples.

2. Augmentation to the masks are actually done inside the get_params_dependent_on_targets
Augmented masks are needed to calculate the bbox augmentation. But I could not find a better place where both the bboxes and masks are accessible except for get_params_dependent_on_targets.
So I implement the masks augmentation in the get_params_dependent_on_targets
- 2022-10-18: Update description and examples.
Most of the work has been completed. @Dipet @ternaus I would appreciate it if you reviewed this.
Sorry for the additional correction. The current implementation assumes that the number of all objects in an image is less than 255("the length of target masks" + "the number of pasted objects" < 255). So, I added some validation code and documentation about the limitation. I think the number 255 is sufficiently large for a typical usecase, but if needed, it can be increased.