supervision icon indicating copy to clipboard operation
supervision copied to clipboard

Allow TIFF (and more) image formats in `load_yolo_annotations`

Open patel-zeel opened this issue 1 year ago • 9 comments

Search before asking

  • [X] I have searched the Supervision issues and found no similar feature requests.

Description

  • Currently, load_yolo_annotations only allows png, jpg, and jpeg file formats. load_yolo_annotations is internally called by sv.DetectionDataset.from_yolo functionality.

https://github.com/roboflow/supervision/blob/1860fdb0a4e21edc5fa03d973e9f31c055bdcf4f/supervision/dataset/formats/yolo.py#L156

  • Ultralytics supports a wide variety of image formats. Copied the following table from their website.
Image Suffix Example Predict Command Reference
.bmp yolo predict source=image.bmp Microsoft BMP File Format
.dng yolo predict source=image.dng Adobe DNG
.jpeg yolo predict source=image.jpeg JPEG
.jpg yolo predict source=image.jpg JPEG
.mpo yolo predict source=image.mpo Multi Picture Object
.png yolo predict source=image.png Portable Network Graphics
.tif yolo predict source=image.tif Tag Image File Format
.tiff yolo predict source=image.tiff Tag Image File Format
.webp yolo predict source=image.webp WebP
.pfm yolo predict source=image.pfm Portable FloatMap
  • Use of TIFF files is common in satellite imagery such as Sentinel-2. One may prefer to preserve the TIFF format over convert it to PNG/JPG because TIFF allows the storage of georeferencing information.
  • I see that the load_yolo_annotations uses cv2.imread to read the image files. OpenCV seems to support many of the Ultralytics-supported formats.

https://github.com/roboflow/supervision/blob/1860fdb0a4e21edc5fa03d973e9f31c055bdcf4f/supervision/dataset/formats/yolo.py#L170

Proposals

  • P1: We can expand the hardcoded list of allowed formats.
  • P2: We may choose not to restrict the image format and let it fail later.

Use case

  • Everyone who'd like to use formats other than png, jpg, and jpeg will be able to use this extension.

Additional

No response

Are you willing to submit a PR?

  • [X] Yes I'd like to help by submitting a PR!

patel-zeel avatar Sep 28 '24 15:09 patel-zeel

Hi @patel-zeel :wave:

Thanks for giving a very specific example. That's invaluable to us.

Generally - sounds good!

What we need to be careful about is silent failures. For example, cv2.imread does not raise an error when an image is not loaded - it simply returns None. Reading videos was similarly problematic, as it wouldn't tell of a missing codex. I bet the current code doesn't handle it all that well.

Next, formats like TIFF may have image layers. We probably want the merged image in those cases. I don't know the peculiaritues of other models.

Generally, If you're aware of what's restricting us from having more image formats, I'm happy for us to limit those restrictions.

LinasKo avatar Sep 28 '24 19:09 LinasKo

Hi @LinasKo,

cv2.imread does not raise an error when an image is not loaded - it simply returns None. Reading videos was similarly problematic, as it wouldn't tell of a missing codex. I bet the current code doesn't handle it all that well.

I see. Is there any reason we are sticking to cv2? Isn't from PIL import Image more Pythonic? At least for the scope of this issue, we are loading the image in memory to know its dimensions. I can test from PIL import Image for some known issues that you might have come across already.

https://github.com/roboflow/supervision/blob/1860fdb0a4e21edc5fa03d973e9f31c055bdcf4f/supervision/dataset/formats/yolo.py#L170-L173

Next, formats like TIFF may have image layers. We probably want the merged image in those cases. I don't know the peculiarities of other models.

That's true. We may also have to assert image.shape[2] == 3. Does supervision/ultralytics work with images with more or less than three channels? I checked that in the case of a multi-image TIFF file, from PIL import Image only loads the first image, which, in some way, takes away the burden of handling it at our end.

from PIL import Image

# Create and save a multi-image TIFF
img1 = Image.new('RGB', (111, 222), color='red')    # 100x100
img2 = Image.new('RGB', (333, 444), color='green')  # 200x150
img3 = Image.new('RGB', (555, 666), color='blue')   # 300x100
img1.save('/tmp/multi_image.tif', save_all=True, append_images=[img2, img3])

# Read the multi-image TIFF
img = Image.open('/tmp/multi_image.tif')
print(img.size)
# Output: (111, 222)

Also, if an image is corrupt, the current code will generate the following error which is uninformative.

---> h, w, _ = image.shape

AttributeError: 'NoneType' object has no attribute 'shape'

Overall, if we can replace cv2 with something else that raises an error on failure, we may lift the image format checker and let the image loader raise the error. If we need to use cv2, it seems challenging to raise a meaningful error with the user about why the image loading was unsuccessful.

patel-zeel avatar Sep 30 '24 08:09 patel-zeel

cv2 loads the image as a numpy array, which makes all subsequent computation easier. I'd strongly prefer to stick with cv2 and avoid conversions from PIL every time.

Supervision uses transparent images very infrequently, e.g. when loading images for the IconAnnotator. If possible, it would be great to keep the channel size flexible. However, if you spot cv2.imread with no extra parameters, it means the image is 2d or 3d. cv2.IMREAD_UNCHANGED param is required to read transparent images.

I've tested it - calling an object detection model in ultralytics on an image with transparency raises a runtime error.

RuntimeError: Given groups=1, weight of size [16, 3, 3, 3], expected input[1, 4, 640, 640] to have 3 channels, but got 4 channels instead

Also, if an image is corrupt, the current code will generate the following error which is uninformative.

Agreed. After most imreads, we should check if the image is None and raise a ValueError. Because of opencv's popularity and convenience, I'd still prefer to stick to it, but we can raise ValueError("Failed to read image") for an earlier warning

That's my line of thinking - hopefully it clears things up a little.

LinasKo avatar Sep 30 '24 09:09 LinasKo

cv2 loads the image as a numpy array, which makes all subsequent computation easier. I'd strongly prefer to stick with cv2 and avoid conversions from PIL every time.

I did a small benchmarking around this in this colab. PIL seems a lot faster than cv2 when checking the shape without loading the data. cv2 is faster than PIL when loading the numpy data.

  • Time (Min - Max) taken (in seconds) to load 500 images (size: 640 x 640) of each type:
Image type cv2 load as numpy PIL shape check only PIL load as numpy
PNG 2.58 - 3.91 0.05 - 0.07 3.49 - 4.77
JPG 4.52 - 5.65 0.07 - 0.12 5.41 - 6.63
TIF 5.27 - 6.64 0.22 - 0.34 11.15 - 11.54

I've tested it - calling an object detection model in ultralytics on an image with transparency raises a runtime error.

Okay, in that case, shall we assert image.shape[2] == 3?

Agreed. After most imreads, we should check if the image is None and raise a ValueError. Because of opencv's popularity and convenience, I'd still prefer to stick to it, but we can raise ValueError("Failed to read image") for an earlier warning

Do the following changes seem reasonable to you?

If we go with cv2:

  • [x] if image is None, we raise ValueError(f"Failed to read image from '{image_path}'").

In any case:

  • [x] We remove type restriction by extension and allow any extension by default.
  • [x] We assert image.shape[2] == 3

patel-zeel avatar Sep 30 '24 11:09 patel-zeel

Interesting details about the shape check. Thanks for carrying out the speed analysis!

I'm happy for us to change the shape check to use PIL in yolo dataset case, as long as we leave a short comment explaining why Pil and not cv2 is used there. In most other places, I'd stick to cv2 for consistency, even if it's slower.

Do the following changes seem reasonable to you?

"Yes" to everything, but I'd steer us towards raising errors rather than using assert. While I personally like assert statements, our standard in this repo is to opt for ValueError instead. I reserve assertions only for assumption checks that we never expect users to see. (ValueError = caller made the mistake, Assertion = us developers made the mistake)

LinasKo avatar Oct 01 '24 12:10 LinasKo

Marking this issue as eligible for Hacktoberfest. It aligns very well the theme we'd like to have, focusing on reliability and feature completeness of supervision.

LinasKo avatar Oct 03 '24 10:10 LinasKo

@LinasKo, feel free to unassign me to open this for the community. I'm in a bit of deadline mode and I'd love to see more people contributing to this repo.

patel-zeel avatar Oct 03 '24 14:10 patel-zeel

@patel-zeel thank you for the research and comments you made so far. I am un-assigning from you

onuralpszr avatar Oct 03 '24 14:10 onuralpszr

@onuralpszr, no, that's too hasty of a decision.

This issue starts from simple foundations but contains a lot of detail, discussions. It can be solved in different ways and at different depths. It is not specific enough for the general community.

I want it to stay Patel's domain as he's proven to have a thoughtful approach and handle both the ambiguity & edge cases.

@patel-zeel, I'll leave this assigned to you - take as much time as you need! If you don't find a good occasion to solve it by November, I'll be happy to take over 😉

LinasKo avatar Oct 03 '24 17:10 LinasKo

@LinasKo and @onuralpszr, #1636 is now merged. I think now this issue can be closed as completed.

patel-zeel avatar Jan 19 '25 09:01 patel-zeel

Fixed by #1636

onuralpszr avatar Jan 19 '25 13:01 onuralpszr