Training fails if no bounding boxes remain in patch after affine transformation
Using A.Affine like so:
def get_transform(augment) -> A.Compose or ToTensorV2:
"""This is the new transform"""
if augment:
transform = A.Compose(
[
A.Affine(rotate=(-180, 180), rotate_method="ellipse", p=0.60, mode=cv2.BORDER_REFLECT_101),
ToTensorV2(),
],
bbox_params=A.BboxParams(format="pascal_voc", label_fields=["category_ids"]),
)
else:
transform = ToTensorV2()
return transform
When this transformation rotates all bounding boxes outside of the patch, leaving no boxes inside, training will result in an index error:
IndexError: tensors used as indices must be long, int, byte or bool tensors
I've fixed this one at the dataset level by having a child dataset class repeat the transform until there is an image with a bounding box:
name, image, targets = super().__getitem__(idx)
while targets["boxes"].size()[0] == 0:
name, image, targets = super().__getitem__(idx)
return name, image, targets
This could also be fixed by setting the target tensor dtype to int, but having negative samples severely degredates the perfomance on my dataset so I've done it this way.
This is interesting and i'm trying to decide if there is a global fix. I know albumentations has clip=True in the bbox params, but as you said that would introduce negative samples by accident. Adding a while loop for transforms feels like it could get you lost in a infinite situation in the conditions were different. Do you have a suggestion for something the API could do to anticipate this.
Well, I think it would be tricky to check this beforehand, given the many options of albumentations for border handling, the possible placement of bboxes on the patch an the range of rotation angles. I think the easiest and quickest way to prevent an infinite loop would be to introduce a retry counter and give it ten retries or something like that. Then, if all tries fail you could maybe apply the transform with the rotation operation removed/angle to zero. However, in most cases I suspect people will have rotation ranges including a zero angle or a right angle, so then the original image with all bboxes in frame should be produced at some point in the while loop anyway. Unless someone would explicitly only want to include non-right angle rotations but I don't see what benefit that would be.