albumentations icon indicating copy to clipboard operation
albumentations copied to clipboard

[WIP] Add Mosaic augmentation

Open i-aki-y opened this issue 2 years ago • 13 comments

About PR

In this PR, I implemented a mosaic augmentation used in YOLO[1, 2].

I appreciate any comment and suggetsion.

[1]: "YOLOv 4 : Optimal speed and accuracy of object detection.", https://arxiv.org/pdf/2004.10934.pdf [2]: YOLOv5 https://github.com/ultralytics/yolov5

Demo

This is a reproducable example:

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import skimage
import albumentations as A

## define helper funcs
def add_bbox(ax, bbox, encoder):
    label = 0    
    if len(bbox) > 4:
        bbox, label = bbox[:4], bbox[4]
        label = encoder[label]
    bbox_color = plt.get_cmap("tab10").colors[label]
    x_min, y_min, x_max, y_max = bbox
    w, h = x_max - x_min, y_max - y_min
    pat = Rectangle(xy=(x_min, y_min), width=w, height=h, fill=False, lw=3, color=bbox_color)
    ax.add_patch(pat)

def plot_image_and_bboxes(image, bboxes, encoder, ax):
    ax.imshow(image)
    for i in range(len(bboxes)):
        add_bbox(ax, bboxes[i], encoder)


## data setup
encoder = {"face": 0, "rocket": 1, "other": 2}
image_list = [skimage.data.astronaut(), skimage.data.cat(), skimage.data.coffee(), skimage.data.rocket()]
bboxes_list = [
    [[170, 30, 280, 180, "face"], [350, 80, 460, 290, "rocket"], [140, 350, 200, 420, "other"]],
    [[50, 0, 350, 280, "face"]],
    [[160, 15, 420, 210, "other"]],
    [[300, 120, 340, 420, "rocket"]],
]

## define pipeline
bbox_format = 'pascal_voc'
transform = A.Compose([
    A.Mosaic(height=2*512, width=2*512, shift_limit_x=0.0, shift_limit_y=0.0, replace=False, p=1.0, fill_value=114, bboxes_format=bbox_format),
    A.RandomResizedCrop(height=512, width=512, scale=(0.4, 1.0)),
], bbox_params=A.BboxParams(format=bbox_format))


## show input images
fig, axes = plt.subplots(2, 2, figsize=(6, 6))
axes = axes.flatten()

for i in range(len(image_list)):
    ax = axes[i]
    ax.set_title(f"input{i}")
    image = image_list[i]
    bboxes = bboxes_list[i]
    plot_image_and_bboxes(image, bboxes, encoder, ax)

plt.show()   
#plt.savefig("mosaic_input.jpg", bbox_inches='tight')

fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes = axes.flatten()
for ax in axes:
    data = transform(image=image_list[0], image_cache=image_list[1:], bboxes=bboxes_list[0], bboxes_cache=bboxes_list[1:])
    image = data["image"]
    bboxes = data["bboxes"]    
    plot_image_and_bboxes(image, bboxes, encoder, ax)

plt.show()       
#plt.savefig("mosaic_output.jpg", bbox_inches='tight')

Input

mosaic_input

Some Results

mosaic_output

Notes

Since current albumentations do not support multiple image sources, I introduced helper targets, image_cache, bboxes_cache, as additional data sources. The user needs to set additional images and bboxes these helper targets. So, it is up to users to decide how to prepare and manage multiple image data. This means that users can set all images to the image_cache when the user has sufficient memory, or the dataset is small. On the other hand, the user can read a small number of images for each iteration.

Note that this PR version does not support the labels_cache target. This means that the user should embed the label information inside the bounding boxes like [xmin, ymin, xmax, ymax, label] (when bboxes_format=pascal_voc).

Another limitation is that the Mosaic augmentation should be placed at the first transform of the pipeline like the above example. Because the transforms placed before the mosaic would be applied only to the image set to image target while the additional images set to image_cache are just ignored. This means images set to image and image_cache will have different augmentation histories. I think this is not what users expect. For example, with the following pipeline, Normalize and RandomResizedCrop will be applied only to the image, not any image_cache.

transform = A.Compose([
    A.Normalize(...),
    A.RandomResizedCrop(...),
    A.Mosaic(...),
    ...
])
data = transform(image=image, image_cache=image_cache)

I think this is not a serious limitation because users can prepare two pipelines and apply them separately if needed.

preprocess = A.Compose([    
    A.Normalize(...),
    A.RandomResizedCrop(...),
])

transform = A.Compose([
    A.Mosaic(...),
])

batch = [preprocess(image=image_batch[i], bboxes=bboxes_batch[i]) for i in range(n)]
image_batch = [data["image"] for data in batch]
bboxes_batch = [data["bboxes"] for data in batch]
data = transform(image=image, image_cache=image_cache, ...)

The same strategy can be applied for other multiple image augmentation like MixUp. For example, I think a similar augmentation used in YOLO5 will be given in the following way.

mosaic = A.Compose([
    A.Mosaic(...),
    A.Affine(...),
    A.RandomResizedCrop(...),
])
mixup = A.Compose([
    A.MixUp(...),  # not included in this PR
])

mosaic1 = mosaic_aug(image=image1, image_cache=image_cache, bboxes=bboxes1, bboxes_cache=bboxes_cache)
mosaic2 = mosaic_aug(image=image2, image_cache=image_cache, bboxes=bboxes2, bboxes_cache=bboxes_cache)
mosaic_mixup = mixup_aug(image=mosaic1["image"], bboxes=mosaic1["bboxes"], image_cache=mosaic2["image"], bboxes_cache=mosaic2["bboxes"])

Implementation Notes

The target_dependence property is used

I used target_dependence property to pass the helper targets to the apply_xxx functions instead of get_params_dependent_on_targets.

This is because the returned values of get_parameters and get_params_dependent_on_targets will become the targets of serialization, which is a mechanism used for "replay". Since I think that these helper targets are not appropriate for serialization, I used the target_dependence property mechanism instead.

Mosaic center is fixed

YOLO5 implementation includes randomization of the mosaic center position.

https://github.com/ultralytics/yolov5/blob/7c6a33564a84a0e78ec19da66ea6016d51c32e0a/utils/datasets.py#L653

I excluded this feature from the PR version because the same effect can be obtained by applying RandomResizedCrop just after the Mosaic as the above demo example.

TODO

  • implement apply_to_keypoints
  • write tests
  • bboxes_cache preprocessing should be done in Compose if possible.

i-aki-y avatar Mar 18 '22 11:03 i-aki-y

Any updates on this?

mikel-brostrom avatar Jan 22 '23 11:01 mikel-brostrom

I tried this out together with some rotation augmentations and seems to work @i-aki-y .

proc

However, from time to time this error arise:

ValueError: y_max is less than or equal to y_min for bbox

when using COCO. Any idea how to fix this @i-aki-y?

Moreover, the structure of the repo must have changed since the PR was created as some refactoring was needed.

mikel-brostrom avatar Feb 20 '23 17:02 mikel-brostrom

Since current albumentations do not support multiple image sources, I introduced helper targets, image_cache, bboxes_cache, as additional data sources

Couldn't this be solved by using additional_targets like here? This would allow the loaded images to be augmented in different ways before mosaic :rocket:

mikel-brostrom avatar Feb 20 '23 17:02 mikel-brostrom

Setting the width and height for Mosaic requires some basic knowledge regarding the dataset your are working with as most part of the image could be left outside otherwise. Maybe this should be reflected in the docstrings as well. I guess that a good set of initial values could be the average width and height of the dataset you work with @i-aki-y ?

mikel-brostrom avatar Feb 21 '23 16:02 mikel-brostrom

@mikel-brostrom Sorry for delaying and thank you for your feedback!

ValueError: y_max is less than or equal to y_min for bbox

This means that some bboxes have zero or minus heights. Did you get this error only when you used mosaic transform?

Moreover, the structure of the repo must have changed since the PR was created as some refactoring was needed.

OK. I will check and update the PR.

Couldn't this be solved by using additional_targets like here? This would allow the loaded images to be augmented in different ways before mosaic 🚀

I missed this feature! I will look into this if I can use this for this transform.

i-aki-y avatar Feb 22 '23 07:02 i-aki-y

I have it working locally so I could create pull request with what I have @i-aki-y . I also have MixUP working as you suggested:

mosaic1 = mosaic_aug(image=image1, image_cache=image_cache, bboxes=bboxes1, bboxes_cache=bboxes_cache)
mosaic2 = mosaic_aug(image=image2, image_cache=image_cache, bboxes=bboxes2, bboxes_cache=bboxes_cache)
mosaic_mixup = mixup_aug(image=mosaic1["image"], bboxes=mosaic1["bboxes"], image_cache=mosaic2["image"], bboxes_cache=mosaic2["bboxes"])

Maybe this should go into a separate PR?

mikel-brostrom avatar Feb 22 '23 07:02 mikel-brostrom

This means that some bboxes have zero or minus heights. Did you get this error only when you used mosaic transform?

Yes. I had multiple augmentations in my augmentation stack but it always appeared during Mosiac. The case was that when the error emerged y_max was always equal to y_min. This couldn't be fixed by setting min_area in A.BboxParams btw...

mikel-brostrom avatar Feb 22 '23 07:02 mikel-brostrom

@mikel-brostrom

Yes. I had multiple augmentations in my augmentation stack but it always appeared during Mosiac. The case was that when the error emerged y_max was always equal to y_min. This couldn't be fixed by setting min_area in A.BboxParams btw...

Thanks, I will investigate it.

I have it working locally so I could create pull request with what I have @i-aki-y . I also have MixUP working as you suggested:

Great!

Maybe this should go into a separate PR?

Yes.

i-aki-y avatar Feb 22 '23 08:02 i-aki-y

Feel free to check out my MixUp implementation here @i-aki-y . Any feedback is appreciated. It works nicely with this Mosaic implementation :smile:. I am going for CutMix :rocket:

mikel-brostrom avatar Feb 23 '23 08:02 mikel-brostrom

Btw, I have implemented this in a slightly different manner.

class Mosaic(DualTransform):
    def __init__(
        self,
        height,
        width,
        replace=True,
        fill_value=0,
        bboxes_format="coco",
        always_apply=False,
        p=0.5,
    ):
        super().__init__(always_apply=always_apply, p=p)
        self.height = height
        self.width = width
        self.replace = replace
        self.fill_value = fill_value
        self.bboxes_format = bboxes_format
        self.images = []
        self.bboxes = []

    def get_transform_init_args_names(self) -> Tuple[str, ...]:
        return ("height", "width", "replace", "fill_value", "bboxes_format")

    def apply(self, image, **params):
        return mosaic4(self.images, self.height, self.width, self.fill_value)

    def apply_to_keypoint(self, **params):
        pass  # TODO
    
    def apply_to_bbox(self, bbox, image_shape, position, height, width, **params):
        rows, cols = image_shape[:2]
        return bbox_mosaic4(bbox, rows, cols, position, height, width)
    
    def apply_to_bboxes(self, bboxes, **params):
        new_bboxes = []
        for i, (bbox, im) in enumerate(zip(self.bboxes, self.images)):
            im_shape = im.shape
            h, w, _ = im_shape
            for b in bbox:
                new_bbox = self.apply_to_bbox(b, im_shape, i, self.height, self.width)
                new_bboxes.append(new_bbox)
        return new_bboxes

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
        self.images = [params['image'], params['image1'], params['image2'], params['image3']]
        self.bboxes = [params['bboxes'], params['bboxes1'], params['bboxes2'], params['bboxes3']]
        images_bboxes = list(zip(self.images, self.bboxes))
        random.shuffle(images_bboxes)
        self.images, self.bboxes = zip(*images_bboxes)
        return {}
        
    @property
    def targets_as_params(self) -> List[str]:
        return [
            "image", "image1", "image2", "image3",
            "bboxes", "bboxes1", "bboxes2", "bboxes3"
        ]

Trying to follow the recommended way of working with multiple images and bboxes. I however see that apply and apply_to_bbox is called equally many times as there are targets. Any ideas on how to circumvent this @i-aki-y ?

mikel-brostrom avatar Feb 24 '23 08:02 mikel-brostrom

@mikel-brostrom

ValueError: y_max is less than or equal to y_min for bbox

I found some coco annotations have bboxes with height == 0.0. This is the cause of the error.

import json
import pathlib
coco_annot_path = pathlib.Path("coco/annotations/instances_train2017.json")
with open(coco_annot_path) as f:
    coco_annots = json.load(f)
for item in coco_annots["annotations"]:
    x, y, w, h = item["bbox"]
    if w == 0 or h == 0:
        print(item)

> {'segmentation': [[296.65, 388.33, 296.65, 388.33, 297.68, 388.33, 297.68, 388.33]], 'area': 0.0, 'iscrowd': 0, 'image_id': 200365, 'bbox': [296.65, 388.33, 1.03, 0.0], 'category_id': 58, 'id': 918}
> {'segmentation': [[9.98, 188.56, 15.52, 188.56, 15.52, 188.56, 11.09, 188.56]], 'area': 0.0, 'iscrowd': 0, 'image_id': 550395, 'bbox': [9.98, 188.56, 5.54, 0.0], 'category_id': 1, 'id': 2206849}

i-aki-y avatar Feb 27 '23 04:02 i-aki-y

But this should be avoided by bbox_params=A.BboxParams(format='coco', min_area=1) right @i-aki-y? I tried this but didn't work for me

mikel-brostrom avatar Feb 27 '23 06:02 mikel-brostrom

@mikel-brostrom No, the filters are applied in the post-processing, while the error occurs in pre-processing validation.

I think they have different purposes. The filters are necessary because some transforms make bbox with zero or tiny areas by design. But invalid data in the input suggest something was wrong in the previous process, which should be fixed.

i-aki-y avatar Feb 27 '23 09:02 i-aki-y