albumentations icon indicating copy to clipboard operation
albumentations copied to clipboard

Added MixUp augmentation

Open mikel-brostrom opened this issue 2 years ago • 14 comments

In this PR, I implemented MixUp (https://arxiv.org/pdf/1710.09412v2.pdf) I appreciate any comment and suggetsion.

Usage:

import cv2
import numpy as np
import albumentations as A
from matplotlib import pyplot as plt


imgsz=640

# helper func
def draw_bboxes_on_img(img, bboxes):
    for bbox in bboxes:
        # top left
        x1 = bbox[0]
        y1 = bbox[1]
        # bottom right
        x2 = bbox[0] + bbox[2]
        y2 = bbox[1] + bbox[3]
        c1 = (int(x1), int(y1))
        c2 = (int(x2), int(y2))
        cv2.rectangle(img, c1, c2, (0, 0, 255), 1)

# images have to be of the same size, hence the resizing
image0 = cv2.imread('./images/train2017/000000000139.jpg')
image1 = cv2.imread('./images/train2017/000000000285.jpg')

# define some bogus bboxes
bboxes0 = [[0, 40, 80, 80, '0'], [0, 80, 160, 160, '1']]
bboxes1 = [[0, 160 , 320, 320, '2']]

# PIPELINE FOR GENERATING EQUALLY SIZED IMAGES
transform1 = A.Compose(
    [
        # https://albumentations.ai/docs/api_reference/augmentations/geometric/resize/#albumentations.augmentations.geometric.resize.LongestMaxSize
        A.geometric.resize.LongestMaxSize(imgsz),
        # https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#albumentations.augmentations.geometric.transforms.PadIfNeeded
        A.geometric.transforms.PadIfNeeded(imgsz, imgsz, border_mode=0, value=(114, 114, 114)),
    ],
    bbox_params=A.BboxParams(format='coco', min_area=20),
)

# MIXUP ONLY AUGMENTAION
transform2 = A.Compose(
    [
        MixUp(
            alpha=32,
            beta=32
        )
    ],
    bbox_params=A.BboxParams(format='coco', min_area=20),
)

# Get equaly sized images without breaking the aspect ratio
transformed = transform1(
    image=image0,
    bboxes=bboxes0,
)

image0_transformed = transformed['image']
bboxe0_transformed = transformed['bboxes']

transformed = transform1(
    image=image1,
    bboxes=bboxes1,
)

image1_transformed = transformed['image']
bboxe1_transformed = transformed['bboxes']

draw_bboxes_on_img(image0_transformed, bboxe0_transformed)
draw_bboxes_on_img(image1_transformed, bboxe1_transformed)
cv2.imwrite('image0_transformed.jpg', image0_transformed)
cv2.imwrite('image1_transformed.jpg', image1_transformed)

# Input the results for the two images into MixUp
transformed = transform2(
    image=image0_transformed,
    image1=image1_transformed,
    bboxes=bboxe0_transformed,
    bboxes1=bboxe1_transformed,
)

image2_transformed = transformed['image']
bboxe2_transformed = transformed['bboxes']

draw_bboxes_on_img(image2_transformed, bboxe2_transformed)
cv2.imwrite('image_transformed.jpg', image2_transformed)

Input images:

Results:

Notes

Notice the images have to be of the same size for this augmentation to work. That is the reason for having the helper augmentation pipeline with LongestMaxSize and PadIfNeeded. MixUp works fine together with mosaic The image size is asserted within the MixUp augmentation and raise a TypeError exception if the images aren't of the same size

mikel-brostrom avatar Feb 23 '23 08:02 mikel-brostrom

Mixup gives consistent boosts on image classification when using the loss presented in the paper. It also helps on COCO for object detection in the case of large models that tend to overfit. For smaller models this augmentation tends to be detrimental and should be avoided.

mikel-brostrom avatar Feb 27 '23 11:02 mikel-brostrom

@mikel-brostrom I advise making the apply_* functions deterministic. Existing transforms put stochastic operations like np.random.* into get_params or get_params_dependent_on_targets. This practice makes the result reproducible and debugging and testing easy. And why not use an argument for the alpha (the parameter of the beta function) instead of the hardcoding 32?

i-aki-y avatar Mar 02 '23 05:03 i-aki-y

Thank for the feedback @i-aki-y! Branch updated based on your comments:

  • stochastic operations moved to get_params
  • apply is now deterministic
  • alpha and beta defining the distribution are now input arguments

Any suggestions on how to avoid calling apply twice when having multiple targets @i-aki-y?

mikel-brostrom avatar Mar 02 '23 06:03 mikel-brostrom

Albumentation has a mapping list from the argument key to the associated functions in the targets variable:

targets = {
    "image": apply_image,
    "bboxes": apply_bboxes,
    ...
}

The functions specified in the targets will be executed one by one when you apply trasform(image=image, bboxes=bboxes).

You have added new entries into the targets variable by using additional_targets

targets = {
    "image": apply_image,
    "bboxes": apply_bboxes,
    ...
    "image1": apply_image,
    ...
}

This means apply_image will be called twice; the first is for the "image", and the second is for the "image1". I think this is the expected behavior when the additional_targets is used.

What will happen if you remove the "additional_targets"?

i-aki-y avatar Mar 02 '23 11:03 i-aki-y

What will happen if you remove the "additional_targets"?

Thanks @i-aki-y! That simple change solved it! Updated the usage example

mikel-brostrom avatar Mar 02 '23 12:03 mikel-brostrom

I consider this ready for review.

Don't want to steal the spotlight here @i-aki-y but should I put a PR up for Mosaic as well? I implemented it using the same approach as in this. Will you update yours? :smile:

mikel-brostrom avatar Mar 02 '23 12:03 mikel-brostrom

I consider this ready for review.

I think you need to consider the case when the input is grayscale len(image.shape) == 2.

Don't want to steal the spotlight here @i-aki-y but should I put a PR up for Mosaic as well? I implemented it using the same approach as in this. Will you update yours? 😄

Sure, you can make your PR. But now I do not think introducing auxiliary targets like 'image_cache' and 'image1' is a good approach. Making a new Compose class that handles multi and single-image targets seems more flexible, and we can make the API more straightforward.

i-aki-y avatar Mar 03 '23 11:03 i-aki-y

Making a new Compose class that handles multi and single-image targets seems more flexible

Yup, I agree here

mikel-brostrom avatar Mar 03 '23 11:03 mikel-brostrom

But now I do not think introducing auxiliary targets like 'image_cache' and 'image1' is a good approach

Yes, I read you comment in your MR that is why I though I could upload mine. But is a multi image compose really needed? You can simply have a single target image and several other as input to complete the mosaic right? What could we gain by a multi-image Compose @i-aki-y ?

mikel-brostrom avatar Mar 03 '23 19:03 mikel-brostrom

@mikel-brostrom My mosaic augmentation's PR have some difficulties, and these are two of them:

  1. We can not define the whole transform as a single Compose. As you did in the above example, we need to define multiple transforms; one is for preprocessing, and the second is for the mosaic. This weakens an advantage defined by the declaration. Ideally, I want to define the following ways:
Compose([
    A.Normalize(),
    A.Resize(),
    A.Mosaic(),
    A.RandomCrop(),
    A.MixUp(),
    ...
])

But it is difficult because the Compose could not know how to handle the additional targets required from Mosaic (and MixUp).

  1. The situation will be more complicated if the transform introduces an additional bboxes target as the Mosaic did. The Compose internally applies some pre- and post-processings, but the auxiliary targets introduced by individual transform bypass these operations. So the author of such a transform needs to re-implement the same pre- and post-processes for the additional targets. But this is a bad practice because such a code duplication reduces maintainability. Even worse、I have no idea how to implement some features implemented in the Compose such as label_fields feature because it is difficult to access the parameter information from the individual transform. The same situation exists for KeypointsParams.

There still exist other minor problems. Anyway, I think I need to extend the Compose to fix issues like those described above. I have been thinking of this issue for the past few days and have just started writing a PoC. I will let you know when it is ready.

i-aki-y avatar Mar 05 '23 02:03 i-aki-y

I see, yes, makes sense to have Compose for multi-input / single-output images.

I have been thinking of this issue for the past few days and have just started writing a PoC. I will let you know when it is ready.

I can try it out when it is done :smile:

mikel-brostrom avatar Mar 05 '23 07:03 mikel-brostrom

I see, yes, makes sense to have Compose for multi-input / single-output images.

I have been thinking of this issue for the past few days and have just started writing a PoC. I will let you know when it is ready.

I can try it out when it is done 😄

Did you manage to work this PR out with https://github.com/albumentations-team/albumentations/pull/1420 implementation?

Any plans to merge this?

thiagoribeirodamotta avatar Jun 23 '23 20:06 thiagoribeirodamotta

I will pick this up again if the multi-input / single-output PR gets merged. Not worth investing the time of adapting this, if it is not happening

mikel-brostrom avatar Jun 23 '23 20:06 mikel-brostrom

Cool PR, thank you! i am using it with albumentations==0.5.2

octavflorescu avatar Feb 14 '24 13:02 octavflorescu

Added in https://github.com/albumentations-team/albumentations/pull/1549

ternaus avatar Mar 05 '24 01:03 ternaus