albumentations icon indicating copy to clipboard operation
albumentations copied to clipboard

Albumentations return empty list after bounding boxes augmentation

Open MaxTeselkin opened this issue 2 years ago • 3 comments

Hi everyone! I am trying to use Albumentations for object detection, but after applying some augmentations it sometimes (not always - which makes it even more strange) returns empty list instead of augmented bounding boxes. Here is a piece of my code:

image = cv2.imread(os.path.join(images_dir, images_filenames[0]))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
boxes = np.loadtxt(os.path.join(boxes_dir, images_filenames[0][:-4]+'.txt'), delimiter=' ')
if boxes.ndim < 2:
  boxes = boxes[np.newaxis, :]
boxes = boxes[:, 1:]

labels = torch.ones((boxes.shape[0], ), dtype=torch.int64)

print(boxes)

transforms = A.Compose(
    [A.Resize(256, 256),
     A.ShiftScaleRotate(shift_limit=0.2, scale_limit=0.2, rotate_limit=30, p=1),
     A.RGBShift(r_shift_limit=20, g_shift_limit=20, b_shift_limit=20, p=1),
     A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.4, p=1),
     A.HorizontalFlip(p=1),
     ToTensorV2()],
     bbox_params=A.BboxParams(format='yolo', min_area=0, min_visibility=0, label_fields=['labels']))
transformed = transforms(image=image, bboxes=boxes, labels=labels)
image = transformed['image']
boxes = transformed['bboxes']
print(boxes)

Output: [[0.04194079 0.90049342 0.08388158 0.06743421]] # this is the original box which I printed before augmentation [] # this is what I get after applying augmentations

The strangiest thing here is the fact that it not always returns an empty list, sometimes it works fine (without changing code!). I set all the probabilities in A.Compose to 1 on purpose to remove any randomness. I also set min_area and min_visibility to 0 to not allow Albumentation to remove boxes. And it still gives me different outputs when I run the same code (sometimes it returns necessary result - a list of augmented boxes, sometimes it returns an empty list). How can it return different outups when I run the same code every time and all probabilities are set to 1?

P.S. I am not new to Albumentations, I used this library before for classification and semantic segmentation and it worked perfectly, but I can't use it for object detection because of this problem. Does anybody know how to solve this problem?

MaxTeselkin avatar May 18 '22 21:05 MaxTeselkin

An intersting observation: this problem disappeared when I set format='pascal_voc' instead of format='yolo'. So if your dataset has bounding boxes in yolo format, then the pipeline will be the following:

  1. convert bounding boxes from yolo to pascal voc format
  2. put converted boxes into Albumentations transform
  3. convert augmented boxes from pascal voc to format required by architecture for neural network you are using.

For example, I am going to use Efficientdet architecture and my dataset has bounding boxes in yolo format. So I will convert boxes from yolo to pascal voc format -> put boxes in pascal voc format into transform -> convert transformed boxes from pascal voc to Efficientdet format.

Functions for converting:

def convert_to_voc(yolo_box, image_width, image_height):
  x_c, y_c, w, h = yolo_box
  x_tl = x_c - w / 2
  y_tl = y_c - h / 2
  x_tl *= image_width
  y_tl *= image_height
  w *= image_width
  h *= image_height
  x_br = x_tl + w
  y_br = y_tl + h
  voc_box = np.array([x_tl, y_tl, x_br, y_br], dtype=np.int64)
  voc_box = list(voc_box)
  return voc_box
def convert_to_effdet(box, image_width, image_height, format):
  if format == 'yolo':
    voc_box = convert_to_voc(box, image_width, image_height)
  elif format == 'pascal_voc':
    voc_box = box
  effdet_order = [1, 0, 3, 2]
  effdet_box = [voc_box[i] for i in effdet_order]
  return effdet_box

Nevertheless, it is still not normal that augmentations with bounding boxes in yolo format do not work correctly, so I will leave this issue opened.

MaxTeselkin avatar May 20 '22 19:05 MaxTeselkin

@MaxTeselkin You should also have an empty bounding boxes list ([]) with the pascal_voc format. The problem is that A.ShiftScaleRotate shifts/scales/rotates the bounding box outside of the image.

victor1cea avatar Jun 07 '22 07:06 victor1cea

@victor1cea, is there a way to receive the bboxes even if it's ouside the image? I can clip the bboxes to the max image size later. I'm constantly getting empty bboxes when the image is rotated.

fernandorovai avatar Mar 24 '23 16:03 fernandorovai