albumentations icon indicating copy to clipboard operation
albumentations copied to clipboard

Augmentation with keyword arguments

Open gunesevitan opened this issue 2 years ago • 8 comments

I was trying to implement an augmentation with keyword arguments that are fetched from a dataframe. I can do this inside my torch dataset class easily but it would be better to implement this in the augmentation pipeline. However, I couldn't find any examples of what I'm trying to do.

I created a custom augmentation which its apply method takes arg1 and arg2.

class Aug(ImageOnlyTransform):

    def apply(self, image, arg1, arg2):

        image = do_something_based_on_args(image, arg1, arg2)

        return image

arg1 and arg2 can be accessed in the dataset's __getitem__ method but I don't how should I pass them to the augmentation pipeline. Is it supposed to be done like this?

transformed = self.transforms(image=image, mask=mask, arg1=arg1, arg2=arg2)

gunesevitan avatar Aug 11 '22 07:08 gunesevitan

Try this:

import numpy as np
import albumentations as A

from typing import List, Tuple, Dict, Any


class Test(A.ImageOnlyTransform):
    def __init__(self, param_key: str = "param1", always_apply=False, p=0.5):
        super().__init__(always_apply, p)
        self.param_key = param_key

    def get_transform_init_args_names(self) -> Tuple[str]:
        return ("param_key",)

    @property
    def targets_as_params(self) -> List[str]:
        return [self.param_key]

    def get_params_dependent_on_targets(self, params: Dict[str, Any]) -> Dict[str, int]:
        return {"my_new_param": params[self.param_key]}

    def apply(self, img: np.ndarray, my_new_param=None) -> np.ndarray:
        # do something
        return img + my_new_param


aug = A.Compose([Test()])
image = np.empty([100, 100, 3], dtype=np.uint8)
result = aug(image=image, param1=100)

Dipet avatar Aug 11 '22 09:08 Dipet

Thank you for the example @Dipet . Can you please explain what are get_transform_init_args_names, get_params_dependent_on_targets and targets_as_params methods used for? It is way more verbose than I expected.

gunesevitan avatar Aug 12 '22 11:08 gunesevitan

  • targets_as_params - if you want to use some targets (arguments that you pass when call the augmentation pipeline) to produce some augmentation parameters on aug call, you need to list all of them here. When the transform is called, they will be provided in get_params_dependent_on_targets. For example: image, mask, bboxes, keypoints - are standard names for our targets.
  • get_params_dependent_on_targets - used to generate parameters based on some targets. If your transform doesn't depend on any target, only it's own arguments, you can use get_params. These functions are used to produce params once per call, this is useful when you are producing some random or heavy params.
  • get_transform_init_args_names - used for serialization purposes. If params names in __init__ are equal to the params names stored inside the transform, you can just enumerate them inside this function. Otherwise, if you have some custom serialization logic, you will have to override the _to_dict method. We may remove remove this function in the future when someone implements automatic parsing of the __init__ call.

Dipet avatar Aug 12 '22 11:08 Dipet

  • targets_as_params - if you want to use some targets (arguments that you pass when call the augmentation pipeline) to produce some augmentation parameters on aug call, you need to list all of them here. When the transform is called, they will be provided in get_params_dependent_on_targets. For example: image, mask, bboxes, keypoints - are standard names for our targets.
  • get_params_dependent_on_targets - used to generate parameters based on some targets. If your transform doesn't depend on any target, only it's own arguments, you can use get_params. These functions are used to produce params once per call, this is useful when you are producing some random or heavy params.
  • get_transform_init_args_names - used for serialization purposes. If params names in __init__ are equal to the params names stored inside the transform, you can just enumerate them inside this function. Otherwise, if you have some custom serialization logic, you will have to override the _to_dict method. We may remove remove this function in the future when someone implements automatic parsing of the __init__ call.

Unfortunately i have the exact same issue, and I get the following error when running the example you provided: TypeError: apply() got an unexpected keyword argument 'cols'

This is on version '1.2.1'

fedshyvana avatar Sep 13 '22 03:09 fedshyvana

To give some context - this is the transform i want to implement (i want to pad each image up to the nearest multiple of "size_divisible", e.g. 32) - which can be readily achieved using torchvision:

class PadToNearestMultiple:
    """pad to nearest multiple of specified size"""

    def __init__(self, size_divisible = 32):
        assert isinstance(size_divisible, int)
        self.sd = size_divisible

    def __call__(self, image):
        return pad_to_nearest_multiple(image, self.sd)

def pad_to_nearest_multiple(image, size_divisible = 32):
    if isinstance(image, np.ndarray):
        return_array = True
        image = Image.fromarray(image)
    else:
        return_array = False
    size = list(image.size) # w, h
    stride = float(size_divisible)
    new_w = int(math.ceil(float(size[0]) / stride) * stride) 
    new_h = int(math.ceil(float(size[1]) / stride) * stride)
#     print(new_w)
#     print(new_h)
    # left, top, right and bottom
    padding = (0, 0, new_w - size[0], new_h - size[1])
    image = TF.pad(image, padding = padding, fill = (0,0,0), padding_mode = 'constant')
    if return_array:
        return np.array(image)
    else:
        return image
        
transforms = PadToNearestMultiple(size_divisible=32)
img = Image.open(filename)
img = transforms(img)

However, I have been struggling to translate this into albumentions. Any help would be appreciated, thanks!

fedshyvana avatar Sep 13 '22 03:09 fedshyvana

Have you tried PadIfNeeded with pad_height_divisor=32, pad_width_divisor=32?

Example:

import numpy as np
import albumentations as A

t = A.Compose([A.PadIfNeeded(pad_height_divisor=32, pad_width_divisor=32, min_width=None, min_height=None)])

img = np.empty([123, 457, 3], dtype=np.uint8)
res = t(image=img)["image"]
print(res.shape)

Dipet avatar Sep 13 '22 19:09 Dipet

Have you tried PadIfNeeded with pad_height_divisor=32, pad_width_divisor=32?

Example:

import numpy as np
import albumentations as A

t = A.Compose([A.PadIfNeeded(pad_height_divisor=32, pad_width_divisor=32, min_width=None, min_height=None)])

img = np.empty([123, 457, 3], dtype=np.uint8)
res = t(image=img)["image"]
print(res.shape)

Thank you! I think this would do the job. Just to clarify one more point with you: suppose I want to pad images to be at least 224 x 224, and then also have dimensions to be divisible by 32, can I compose 2 A.PadIfNeeded() transforms together? first one would have min_width/min_height set to 256, and the second will have pad_height_divisor/width_divisor set to 32, while min_width/min_height set to None. Reason being PadIfNeeded apparently does not allow me to specify both min_width and pad_width_divisor in the same transform.

fedshyvana avatar Sep 16 '22 02:09 fedshyvana

I want to pad images to be at least 224 x 224, and then also have dimensions to be divisible by 32

Yes, looks like we need to add flag to support this kind of transform. You are right - you can use 2 PadIfNeeded:

t = A.Compose([
    A.PadIfNeeded(min_width=256, min_height=256),
    A.PadIfNeeded(pad_height_divisor=32, pad_width_divisor=32, min_width=None, min_height=None),
])

Dipet avatar Sep 18 '22 23:09 Dipet