albumentations
albumentations copied to clipboard
Feature request: Object-based augmentation via cropping and pasting
Hi!
I propose to add transforms that cut objects from different images using their segmentation masks and paste them to the new background. The idea is described here and some other research papers, and a demo can be found here.
Why is it useful?
It allows adding extra variability to training images by combining multiple objects on one scene and apply augmentations separately to objects, background, and the whole scene.
What are the cases?
- It is extremely useful for few-shot learning problems where we have not much training data. In particular, it's proven to work well for agricultural domain and remote sensing problems.
- It enables solving problems "in the wild" having only "lab" images by explicitly controlling the number of objects, their overlapping, background, and noise.
- It allows preparing datasets for object counting and in some cases object detection, multiclass classification, instance segmentation, semantic segmentation, multi-task learning, etc. even if the original dataset was aimed to solve instance segmentation problem only.
- Other cases TBD
How difficult is it to add it?
The main point is it doesn't require changes to existing code, and can be implemented as a wrapper.
Limitations
This method assumes that we have instance segmentation masks for the objects of interest. If only bounding boxes provided, we can still apply copy-pasing of the whole box like here.
Suggested functional interface
class ObjectBasedAugmentor:
'''
Generates scenes based on objects from multiple images as described in https://arxiv.org/abs/2102.12295.
Can take input sources either during initialization for more automative work or
during each call for more controllable behavior. If only bounding boxes provided,
applies copy-pasting of the whole box like shown in https://arxiv.org/abs/1906.11172.
Args:
images (Union[Iterable[np.ndarray], List[str]]): iterable of np.ndarray images or
list of image pathes or None. If None, should be specified in object call.
Original images with objects of interest.
instance_masks (Union[Iterable[np.ndarray], List[str]]): iterable of np.ndarray images or
list of image pathes or None. If None, should be specified in object call.
Instance masks for the corresponding images. One layer per instance.
backgrounds (Optional[Union[Iterable[np.ndarray], List[str], None]], List[str]]):
iterable of np.ndarray images or list of image pathes or None.
If None, should be specified in object call. Scene backgrounds.
Must have the same number of channels as image.
additional_targets (Dict[str, np.ndarray]): dictionary with additional masks to transform.
unique_color_masks (List[str]): list with names of masks from additional_targets.keys()
for which unique colors for every original color should be generated. If mask
is not in list, colors remain original after pasting objects on new scene.
keypoints (list[int]): bounding boxes in [x0, y0, x1, y1] format,
ranging from 0 to W and 0 to H.
object_transforms (Callable[[np.ndarray], np.ndarray]): transforms or their composition
to apply to each object independantly.
background_transforms (Callable[[np.ndarray], np.ndarray]): transforms or their composition
to apply to the background.
scene_transforms (Callable[[np.ndarray], np.ndarray]): transforms or their composition
to apply to the whole scene after pasting all objects.
preprocess_dataset (bool): if True, dataset statistics will be calculated during init.
Enables using class_proba.
return_semantic (bool): if True, return additional semantic mask.
add_bboxes (bool): if True, calculated bounding boxes based on segmentation masks.
objects_per_scene (int): the number of pasted objects in the final scene.
overlap_ratio (float): the ratio of objects' overlapping in the final scene. [0...].
packaging_rule (str): the algorithm to place objects on the scene.
One of ['smallest', 'random', 'grid'].
result_size (Union[int, Tuple[int, int], str]): the way to process he size of the resulting scene.
Original size if 'as_is'. [N, N] if N. [N, M] if (N, M).
class_proba (Optional[np.ndarray]): defines the probability to choose object from each class.
Must have preprocess_dataset enabled.
adjust_sizes (bool): if True, normalizes sizes of pasted objects.
'''
def __init__(self,
images: Union[Iterable[np.ndarray], List[str], None],
instance_masks: Union[Iterable[np.ndarray], List[str], None],
backgrounds: Optional[Union[Iterable[np.ndarray], List[str], None]],
bboxes: Optional[List[int]],
additional_targets: Optional[Dict[str, np.ndarray]],
unique_color_masks: Optional[List[str]],
keypoints: Optional[np.ndarray],
object_transforms: Optional[Callable[[np.ndarray], np.ndarray]],
background_transforms: Optional[Callable[[np.ndarray], np.ndarray]],
scene_transforms: Optional[Callable[[np.ndarray], np.ndarray]],
preprocess_dataset: bool=False,
return_semantic: bool=False,
add_bboxes: bool=False,
objects_per_scene: int=4,
overlap_ratio: float=.0,
packaging_rule: str='smallest',
result_size: Union[int, Tuple[int, int], str]='as_is',
class_proba: Optional[np.ndarray]=[],
adjust_sizes: bool=False):
pass
def __call__(self,
images: Union[Iterable[np.ndarray], List[str], None],
instance_masks: Union[Iterable[np.ndarray], List[str], None],
backgrounds: Optional[Union[Iterable[np.ndarray], List[str], None]],
bboxes: Optional[List[int]],
additional_targets: Optional[Dict[str, np.ndarray]],
unique_color_masks: Optional[List[str]],
keypoints: Optional[np.ndarray],
object_transforms: Optional[Callable[[np.ndarray], np.ndarray]],
background_transforms: Optional[Callable[[np.ndarray], np.ndarray]],
scene_transforms: Optional[Callable[[np.ndarray], np.ndarray]],
return_semantic: bool=False,
add_bboxes: bool=False,
objects_per_scene: int=4,
overlap_ratio: float=.0,
packaging_rule: str='smallest',
result_size: Union[int, Tuple[int, int], str]='as_is',
adjust_sizes: bool=False
) -> Dict[str, np.ndarray]:
'''
Returns:
result (Dict[str, np.ndarray]): dictionary with scene, transformed masks,
bounding boxes, and keypoints.
'''
pass
It will also require adding some utils for copy-pasting objects.
Hey @NesterukSergey, thanks. Looks good to me! We can proceed with implementing this feature with Albumetnations.
I propose to create a new package augmentors
in the albumentations
directory and place all the required code into this package.
Hey @NesterukSergey, thanks. Looks good to me! We can proceed with implementing this feature with Albumetnations.
I propose to create a new package
augmentors
in thealbumentations
directory and place all the required code into this package.
The idea is inline with the copy-paste augmentation method which achieves very promising performance improvements. This would be a great addition to the Albumentations augmentations.