albumentations
albumentations copied to clipboard
Albumentations return empty list after bounding boxes augmentation
Hi everyone! I am trying to use Albumentations for object detection, but after applying some augmentations it sometimes (not always - which makes it even more strange) returns empty list instead of augmented bounding boxes. Here is a piece of my code:
image = cv2.imread(os.path.join(images_dir, images_filenames[0]))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
boxes = np.loadtxt(os.path.join(boxes_dir, images_filenames[0][:-4]+'.txt'), delimiter=' ')
if boxes.ndim < 2:
boxes = boxes[np.newaxis, :]
boxes = boxes[:, 1:]
labels = torch.ones((boxes.shape[0], ), dtype=torch.int64)
print(boxes)
transforms = A.Compose(
[A.Resize(256, 256),
A.ShiftScaleRotate(shift_limit=0.2, scale_limit=0.2, rotate_limit=30, p=1),
A.RGBShift(r_shift_limit=20, g_shift_limit=20, b_shift_limit=20, p=1),
A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.4, p=1),
A.HorizontalFlip(p=1),
ToTensorV2()],
bbox_params=A.BboxParams(format='yolo', min_area=0, min_visibility=0, label_fields=['labels']))
transformed = transforms(image=image, bboxes=boxes, labels=labels)
image = transformed['image']
boxes = transformed['bboxes']
print(boxes)
Output: [[0.04194079 0.90049342 0.08388158 0.06743421]] # this is the original box which I printed before augmentation [] # this is what I get after applying augmentations
The strangiest thing here is the fact that it not always returns an empty list, sometimes it works fine (without changing code!). I set all the probabilities in A.Compose to 1 on purpose to remove any randomness. I also set min_area and min_visibility to 0 to not allow Albumentation to remove boxes. And it still gives me different outputs when I run the same code (sometimes it returns necessary result - a list of augmented boxes, sometimes it returns an empty list). How can it return different outups when I run the same code every time and all probabilities are set to 1?
P.S. I am not new to Albumentations, I used this library before for classification and semantic segmentation and it worked perfectly, but I can't use it for object detection because of this problem. Does anybody know how to solve this problem?
An intersting observation: this problem disappeared when I set format='pascal_voc' instead of format='yolo'. So if your dataset has bounding boxes in yolo format, then the pipeline will be the following:
- convert bounding boxes from yolo to pascal voc format
- put converted boxes into Albumentations transform
- convert augmented boxes from pascal voc to format required by architecture for neural network you are using.
For example, I am going to use Efficientdet architecture and my dataset has bounding boxes in yolo format. So I will convert boxes from yolo to pascal voc format -> put boxes in pascal voc format into transform -> convert transformed boxes from pascal voc to Efficientdet format.
Functions for converting:
def convert_to_voc(yolo_box, image_width, image_height):
x_c, y_c, w, h = yolo_box
x_tl = x_c - w / 2
y_tl = y_c - h / 2
x_tl *= image_width
y_tl *= image_height
w *= image_width
h *= image_height
x_br = x_tl + w
y_br = y_tl + h
voc_box = np.array([x_tl, y_tl, x_br, y_br], dtype=np.int64)
voc_box = list(voc_box)
return voc_box
def convert_to_effdet(box, image_width, image_height, format):
if format == 'yolo':
voc_box = convert_to_voc(box, image_width, image_height)
elif format == 'pascal_voc':
voc_box = box
effdet_order = [1, 0, 3, 2]
effdet_box = [voc_box[i] for i in effdet_order]
return effdet_box
Nevertheless, it is still not normal that augmentations with bounding boxes in yolo format do not work correctly, so I will leave this issue opened.
@MaxTeselkin You should also have an empty bounding boxes list ([]
) with the pascal_voc
format. The problem is that A.ShiftScaleRotate
shifts/scales/rotates the bounding box outside of the image.
@victor1cea, is there a way to receive the bboxes even if it's ouside the image? I can clip the bboxes to the max image size later. I'm constantly getting empty bboxes when the image is rotated.