albumentations
albumentations copied to clipboard
[WIP] Add Mosaic augmentation
About PR
In this PR, I implemented a mosaic augmentation used in YOLO[1, 2].
I appreciate any comment and suggetsion.
[1]: "YOLOv 4 : Optimal speed and accuracy of object detection.", https://arxiv.org/pdf/2004.10934.pdf [2]: YOLOv5 https://github.com/ultralytics/yolov5
Demo
This is a reproducable example:
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import skimage
import albumentations as A
## define helper funcs
def add_bbox(ax, bbox, encoder):
label = 0
if len(bbox) > 4:
bbox, label = bbox[:4], bbox[4]
label = encoder[label]
bbox_color = plt.get_cmap("tab10").colors[label]
x_min, y_min, x_max, y_max = bbox
w, h = x_max - x_min, y_max - y_min
pat = Rectangle(xy=(x_min, y_min), width=w, height=h, fill=False, lw=3, color=bbox_color)
ax.add_patch(pat)
def plot_image_and_bboxes(image, bboxes, encoder, ax):
ax.imshow(image)
for i in range(len(bboxes)):
add_bbox(ax, bboxes[i], encoder)
## data setup
encoder = {"face": 0, "rocket": 1, "other": 2}
image_list = [skimage.data.astronaut(), skimage.data.cat(), skimage.data.coffee(), skimage.data.rocket()]
bboxes_list = [
[[170, 30, 280, 180, "face"], [350, 80, 460, 290, "rocket"], [140, 350, 200, 420, "other"]],
[[50, 0, 350, 280, "face"]],
[[160, 15, 420, 210, "other"]],
[[300, 120, 340, 420, "rocket"]],
]
## define pipeline
bbox_format = 'pascal_voc'
transform = A.Compose([
A.Mosaic(height=2*512, width=2*512, shift_limit_x=0.0, shift_limit_y=0.0, replace=False, p=1.0, fill_value=114, bboxes_format=bbox_format),
A.RandomResizedCrop(height=512, width=512, scale=(0.4, 1.0)),
], bbox_params=A.BboxParams(format=bbox_format))
## show input images
fig, axes = plt.subplots(2, 2, figsize=(6, 6))
axes = axes.flatten()
for i in range(len(image_list)):
ax = axes[i]
ax.set_title(f"input{i}")
image = image_list[i]
bboxes = bboxes_list[i]
plot_image_and_bboxes(image, bboxes, encoder, ax)
plt.show()
#plt.savefig("mosaic_input.jpg", bbox_inches='tight')
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes = axes.flatten()
for ax in axes:
data = transform(image=image_list[0], image_cache=image_list[1:], bboxes=bboxes_list[0], bboxes_cache=bboxes_list[1:])
image = data["image"]
bboxes = data["bboxes"]
plot_image_and_bboxes(image, bboxes, encoder, ax)
plt.show()
#plt.savefig("mosaic_output.jpg", bbox_inches='tight')
Input
Some Results
Notes
Since current albumentations do not support multiple image sources, I introduced helper targets, image_cache
, bboxes_cache
, as additional data sources. The user needs to set additional images and bboxes these helper targets. So, it is up to users to decide how to prepare and manage multiple image data. This means that users can set all images to the image_cache
when the user has sufficient memory, or the dataset is small. On the other hand, the user can read a small number of images for each iteration.
Note that this PR version does not support the labels_cache target. This means that the user should embed the label information inside the bounding boxes like [xmin, ymin, xmax, ymax, label]
(when bboxes_format=pascal_voc
).
Another limitation is that the Mosaic augmentation should be placed at the first transform of the pipeline like the above example. Because the transforms placed before the mosaic would be applied only to the image set to image
target while the additional images set to image_cache
are just ignored. This means images set to image
and image_cache
will have different augmentation histories. I think this is not what users expect. For example, with the following pipeline, Normalize
and RandomResizedCrop
will be applied only to the image, not any image_cache.
transform = A.Compose([
A.Normalize(...),
A.RandomResizedCrop(...),
A.Mosaic(...),
...
])
data = transform(image=image, image_cache=image_cache)
I think this is not a serious limitation because users can prepare two pipelines and apply them separately if needed.
preprocess = A.Compose([
A.Normalize(...),
A.RandomResizedCrop(...),
])
transform = A.Compose([
A.Mosaic(...),
])
batch = [preprocess(image=image_batch[i], bboxes=bboxes_batch[i]) for i in range(n)]
image_batch = [data["image"] for data in batch]
bboxes_batch = [data["bboxes"] for data in batch]
data = transform(image=image, image_cache=image_cache, ...)
The same strategy can be applied for other multiple image augmentation like MixUp. For example, I think a similar augmentation used in YOLO5 will be given in the following way.
mosaic = A.Compose([
A.Mosaic(...),
A.Affine(...),
A.RandomResizedCrop(...),
])
mixup = A.Compose([
A.MixUp(...), # not included in this PR
])
mosaic1 = mosaic_aug(image=image1, image_cache=image_cache, bboxes=bboxes1, bboxes_cache=bboxes_cache)
mosaic2 = mosaic_aug(image=image2, image_cache=image_cache, bboxes=bboxes2, bboxes_cache=bboxes_cache)
mosaic_mixup = mixup_aug(image=mosaic1["image"], bboxes=mosaic1["bboxes"], image_cache=mosaic2["image"], bboxes_cache=mosaic2["bboxes"])
Implementation Notes
The target_dependence
property is used
I used target_dependence
property to pass the helper targets to the apply_xxx
functions instead of get_params_dependent_on_targets
.
This is because the returned values of get_parameters
and get_params_dependent_on_targets
will become the targets of serialization, which is a mechanism used for "replay". Since I think that these helper targets are not appropriate for serialization, I used the target_dependence
property mechanism instead.
Mosaic center is fixed
YOLO5 implementation includes randomization of the mosaic center position.
https://github.com/ultralytics/yolov5/blob/7c6a33564a84a0e78ec19da66ea6016d51c32e0a/utils/datasets.py#L653
I excluded this feature from the PR version because the same effect can be obtained by applying RandomResizedCrop
just after the Mosaic
as the above demo example.
TODO
- implement
apply_to_keypoints
- write tests
- bboxes_cache preprocessing should be done in
Compose
if possible.
Any updates on this?
I tried this out together with some rotation augmentations and seems to work @i-aki-y .
However, from time to time this error arise:
ValueError: y_max is less than or equal to y_min for bbox
when using COCO. Any idea how to fix this @i-aki-y?
Moreover, the structure of the repo must have changed since the PR was created as some refactoring was needed.
Since current albumentations do not support multiple image sources, I introduced helper targets, image_cache, bboxes_cache, as additional data sources
Couldn't this be solved by using additional_targets
like here?
This would allow the loaded images to be augmented in different ways before mosaic :rocket:
Setting the width
and height
for Mosaic
requires some basic knowledge regarding the dataset your are working with as most part of the image could be left outside otherwise. Maybe this should be reflected in the docstrings as well. I guess that a good set of initial values could be the average width and height of the dataset you work with @i-aki-y ?
@mikel-brostrom Sorry for delaying and thank you for your feedback!
ValueError: y_max is less than or equal to y_min for bbox
This means that some bboxes have zero or minus heights. Did you get this error only when you used mosaic transform?
Moreover, the structure of the repo must have changed since the PR was created as some refactoring was needed.
OK. I will check and update the PR.
Couldn't this be solved by using additional_targets like here? This would allow the loaded images to be augmented in different ways before mosaic 🚀
I missed this feature! I will look into this if I can use this for this transform.
I have it working locally so I could create pull request with what I have @i-aki-y . I also have MixUP working as you suggested:
mosaic1 = mosaic_aug(image=image1, image_cache=image_cache, bboxes=bboxes1, bboxes_cache=bboxes_cache)
mosaic2 = mosaic_aug(image=image2, image_cache=image_cache, bboxes=bboxes2, bboxes_cache=bboxes_cache)
mosaic_mixup = mixup_aug(image=mosaic1["image"], bboxes=mosaic1["bboxes"], image_cache=mosaic2["image"], bboxes_cache=mosaic2["bboxes"])
Maybe this should go into a separate PR?
This means that some bboxes have zero or minus heights. Did you get this error only when you used mosaic transform?
Yes. I had multiple augmentations in my augmentation stack but it always appeared during Mosiac. The case was that when the error emerged y_max
was always equal to y_min
. This couldn't be fixed by setting min_area
in A.BboxParams
btw...
@mikel-brostrom
Yes. I had multiple augmentations in my augmentation stack but it always appeared during Mosiac. The case was that when the error emerged y_max was always equal to y_min. This couldn't be fixed by setting min_area in A.BboxParams btw...
Thanks, I will investigate it.
I have it working locally so I could create pull request with what I have @i-aki-y . I also have MixUP working as you suggested:
Great!
Maybe this should go into a separate PR?
Yes.
Feel free to check out my MixUp implementation here @i-aki-y . Any feedback is appreciated. It works nicely with this Mosaic implementation :smile:. I am going for CutMix
:rocket:
Btw, I have implemented this in a slightly different manner.
class Mosaic(DualTransform):
def __init__(
self,
height,
width,
replace=True,
fill_value=0,
bboxes_format="coco",
always_apply=False,
p=0.5,
):
super().__init__(always_apply=always_apply, p=p)
self.height = height
self.width = width
self.replace = replace
self.fill_value = fill_value
self.bboxes_format = bboxes_format
self.images = []
self.bboxes = []
def get_transform_init_args_names(self) -> Tuple[str, ...]:
return ("height", "width", "replace", "fill_value", "bboxes_format")
def apply(self, image, **params):
return mosaic4(self.images, self.height, self.width, self.fill_value)
def apply_to_keypoint(self, **params):
pass # TODO
def apply_to_bbox(self, bbox, image_shape, position, height, width, **params):
rows, cols = image_shape[:2]
return bbox_mosaic4(bbox, rows, cols, position, height, width)
def apply_to_bboxes(self, bboxes, **params):
new_bboxes = []
for i, (bbox, im) in enumerate(zip(self.bboxes, self.images)):
im_shape = im.shape
h, w, _ = im_shape
for b in bbox:
new_bbox = self.apply_to_bbox(b, im_shape, i, self.height, self.width)
new_bboxes.append(new_bbox)
return new_bboxes
def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, Any]:
self.images = [params['image'], params['image1'], params['image2'], params['image3']]
self.bboxes = [params['bboxes'], params['bboxes1'], params['bboxes2'], params['bboxes3']]
images_bboxes = list(zip(self.images, self.bboxes))
random.shuffle(images_bboxes)
self.images, self.bboxes = zip(*images_bboxes)
return {}
@property
def targets_as_params(self) -> List[str]:
return [
"image", "image1", "image2", "image3",
"bboxes", "bboxes1", "bboxes2", "bboxes3"
]
Trying to follow the recommended way of working with multiple images and bboxes. I however see that apply
and apply_to_bbox
is called equally many times as there are targets. Any ideas on how to circumvent this @i-aki-y ?
@mikel-brostrom
ValueError: y_max is less than or equal to y_min for bbox
I found some coco annotations have bboxes with height == 0.0. This is the cause of the error.
import json
import pathlib
coco_annot_path = pathlib.Path("coco/annotations/instances_train2017.json")
with open(coco_annot_path) as f:
coco_annots = json.load(f)
for item in coco_annots["annotations"]:
x, y, w, h = item["bbox"]
if w == 0 or h == 0:
print(item)
> {'segmentation': [[296.65, 388.33, 296.65, 388.33, 297.68, 388.33, 297.68, 388.33]], 'area': 0.0, 'iscrowd': 0, 'image_id': 200365, 'bbox': [296.65, 388.33, 1.03, 0.0], 'category_id': 58, 'id': 918}
> {'segmentation': [[9.98, 188.56, 15.52, 188.56, 15.52, 188.56, 11.09, 188.56]], 'area': 0.0, 'iscrowd': 0, 'image_id': 550395, 'bbox': [9.98, 188.56, 5.54, 0.0], 'category_id': 1, 'id': 2206849}
But this should be avoided by bbox_params=A.BboxParams(format='coco', min_area=1)
right @i-aki-y? I tried this but didn't work for me
@mikel-brostrom No, the filters are applied in the post-processing, while the error occurs in pre-processing validation.
I think they have different purposes. The filters are necessary because some transforms make bbox with zero or tiny areas by design. But invalid data in the input suggest something was wrong in the previous process, which should be fixed.