albumentations icon indicating copy to clipboard operation
albumentations copied to clipboard

Improved version of BBoxSafeRandomCrop with fixed size

Open chris-ml92 opened this issue 2 years ago • 3 comments

I have created a new version of the BBoxSafeRandomCrop, which I've called BoxSafeRandomCropFixedSize.

When it comes to detect small objects coming from high resolution images, is often difficult to crop the images around the object. Only a small portions of a the image are needed. This portions are obviously random and vary for each image. I came up with this variation that I think is really handy.

This augmentation chooses randomly one of the bboxes of the image and crops it in a random position (around the bbox) of a fixed size.

The erosion rate is also required in order to set the fraction of the bounding box area that is allowed not to be included in the crop. I have used it with the coco format so far. It would be cool to add it to the library.

class BBoxSafeRandomCropFixedSize(DualTransform):
    """Crop a random part of the input image around a bounding box
    that is selected randomly from the bounding boxes provided.
    Args:
        crop_height (int): height of the crop.
        crop_width (int): width of the crop.
        bbox_erosion_rate (float): erosion rate applied on input image height before crop.
        p (float): probability of applying the transform. Default: 1.
    Targets:
        image, mask, bboxes
    Image types:
        uint8, float32
    """

    def __init__(self, crop_height: int, crop_width: int, bbox_erosion_rate=0.0, always_apply=False, p=1.0):
        super(BBoxSafeRandomCropFixedSize, self).__init__(always_apply, p)
        self.crop_height = crop_height
        self.crop_width = crop_width
        self.bbox_erosion_rate = bbox_erosion_rate

    @property
    def targets_as_params(self):
        return ["image", "bboxes"]

    def get_params_dependent_on_targets(self, params):
        image_height, image_width = params["image"].shape[:2]
        bboxes_list = params["bboxes"]

        # Check that the cropping window is smaller than the image.
        if self.crop_width > image_width or self.crop_height > image_height:
            raise ValueError("The desired crop window size is larger than the image!")

        # Check that the image is provided with at least one bounding box.
        if len(bboxes_list) == 0:
            raise ValueError("The image to crop must be provided with at least one bounding box!")

        # We choose randomly around which bounding box to perform the crop.
        selected_bbox_index = random.randint(0, len(bboxes_list) - 1)
        selected_bbox = bboxes_list[selected_bbox_index]

        # Convert the bounding box coordinates from the Albumentations normalized format to actual pixel values.
        bbox_x_min = round(selected_bbox[0] * image_width)
        bbox_y_min = round(selected_bbox[1] * image_height)
        bbox_width = round((selected_bbox[2] - selected_bbox[0]) * image_width)
        bbox_height = round((selected_bbox[3] - selected_bbox[1]) * image_height)

        # Convert the 'bbox_erosion_rate' from being area-related to be referred to width and height of the bounding box.
        # N.B. We choose to always erode the bounding box width and height in equal amounts (in relative terms).
        #      For example, to get an area to be eroded up to 0.5 (50%), we erode both width and height up to the square root of 0.5.
        bbox_linear_erosion_rate = 1.0 - math.sqrt(1.0 - self.bbox_erosion_rate)

        # Check that the cropping window is large enough to include entirely the non-erodible part of the selected bounding box.
        # N.B. This means that, if the 'bbox_erosion_rate' is 1.0, then the crop size could also be (0, 0),
        #      as it is not expected to include any part of the selected bounding box.
        if self.crop_width < bbox_width - round(
            bbox_linear_erosion_rate * bbox_width
        ) or self.crop_height < bbox_height - round(bbox_linear_erosion_rate * bbox_height):
            raise ValueError(
                "The desired crop window size is too small to include the required area of the selected bounding box!"
            )

        # The coordinates of the cropping window are defined in terms of x_min, x_max, y_min, y_max.
        # Find the lower and upper limits for the values of x_min and y_min, also taking into consideration the "erosion_rate".
        x_min_low_limit = bbox_x_min - round(bbox_linear_erosion_rate * bbox_width) - (self.crop_width - bbox_width)
        x_min_low_limit = x_min_low_limit if x_min_low_limit > 0 else 0

        y_min_low_limit = bbox_y_min - round(bbox_linear_erosion_rate * bbox_height) - (self.crop_height - bbox_height)
        y_min_low_limit = y_min_low_limit if y_min_low_limit > 0 else 0

        x_min_high_limit = bbox_x_min + round(bbox_linear_erosion_rate * bbox_width)
        x_min_high_limit = (
            x_min_high_limit if x_min_high_limit + self.crop_width <= image_width else image_width - self.crop_width
        )

        y_min_high_limit = bbox_y_min + round(bbox_linear_erosion_rate * bbox_height)
        y_min_high_limit = y_min_high_limit if y_min_high_limit + self.crop_height <= image_height else image_height - self.crop_height
        
        # Compute the random values for x_min and y_min within their intervals.
        x_min = random.randint(x_min_low_limit, x_min_high_limit)
        y_min = random.randint(y_min_low_limit, y_min_high_limit)

        # Define the cropping window boundaries to be used to actually crop the image and update the bounding box location.
        self.x_min = x_min
        self.y_min = y_min
        self.x_max = x_min + self.crop_width
        self.y_max = y_min + self.crop_height
        
        return {}

    def apply(self, img, **params):
        return F.crop(img, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max)

    def apply_to_bbox(self, bbox, **params):
        return F.bbox_crop(bbox, x_min=self.x_min, y_min=self.y_min, x_max=self.x_max, y_max=self.y_max, **params)

    def get_transform_init_args_names(self):
        return ("crop_height", "crop_width")

chris-ml92 avatar Sep 01 '22 10:09 chris-ml92

By the way, I'm actually using it with a custom version of albumentations, but would love to contribute to the project.

chris-ml92 avatar Sep 01 '22 11:09 chris-ml92

We are always glad to see new functionality in the library. You are welcome to add this transform to the library.

Dipet avatar Sep 02 '22 08:09 Dipet

That's great. I'll do a MR in the next few days. As soon as I get some free time

chris-ml92 avatar Sep 05 '22 10:09 chris-ml92