supervision Allow TIFF (and more) image formats in `load_yolo

Search before asking

[X] I have searched the Supervision issues and found no similar feature requests.

Description

Currently, load_yolo_annotations only allows png, jpg, and jpeg file formats. load_yolo_annotations is internally called by sv.DetectionDataset.from_yolo functionality.

https://github.com/roboflow/supervision/blob/1860fdb0a4e21edc5fa03d973e9f31c055bdcf4f/supervision/dataset/formats/yolo.py#L156

Ultralytics supports a wide variety of image formats. Copied the following table from their website.

Image Suffix	Example Predict Command	Reference
.bmp	yolo predict source=image.bmp	Microsoft BMP File Format
.dng	yolo predict source=image.dng	Adobe DNG
.jpeg	yolo predict source=image.jpeg	JPEG
.jpg	yolo predict source=image.jpg	JPEG
.mpo	yolo predict source=image.mpo	Multi Picture Object
.png	yolo predict source=image.png	Portable Network Graphics
.tif	yolo predict source=image.tif	Tag Image File Format
.tiff	yolo predict source=image.tiff	Tag Image File Format
.webp	yolo predict source=image.webp	WebP
.pfm	yolo predict source=image.pfm	Portable FloatMap

Use of TIFF files is common in satellite imagery such as Sentinel-2. One may prefer to preserve the TIFF format over convert it to PNG/JPG because TIFF allows the storage of georeferencing information.
I see that the load_yolo_annotations uses cv2.imread to read the image files. OpenCV seems to support many of the Ultralytics-supported formats.

https://github.com/roboflow/supervision/blob/1860fdb0a4e21edc5fa03d973e9f31c055bdcf4f/supervision/dataset/formats/yolo.py#L170

Proposals

P1: We can expand the hardcoded list of allowed formats.
P2: We may choose not to restrict the image format and let it fail later.

Use case

Everyone who'd like to use formats other than png, jpg, and jpeg will be able to use this extension.

Additional

No response

Are you willing to submit a PR?

[X] Yes I'd like to help by submitting a PR!

Sep 28 '24 15:09 patel-zeel

Hi @patel-zeel :wave:

Thanks for giving a very specific example. That's invaluable to us.

Generally - sounds good!

What we need to be careful about is silent failures. For example, cv2.imread does not raise an error when an image is not loaded - it simply returns None. Reading videos was similarly problematic, as it wouldn't tell of a missing codex. I bet the current code doesn't handle it all that well.

Next, formats like TIFF may have image layers. We probably want the merged image in those cases. I don't know the peculiaritues of other models.

Generally, If you're aware of what's restricting us from having more image formats, I'm happy for us to limit those restrictions.

Sep 28 '24 19:09 LinasKo

Hi @LinasKo,

cv2.imread does not raise an error when an image is not loaded - it simply returns None. Reading videos was similarly problematic, as it wouldn't tell of a missing codex. I bet the current code doesn't handle it all that well.

I see. Is there any reason we are sticking to cv2? Isn't from PIL import Image more Pythonic? At least for the scope of this issue, we are loading the image in memory to know its dimensions. I can test from PIL import Image for some known issues that you might have come across already.

https://github.com/roboflow/supervision/blob/1860fdb0a4e21edc5fa03d973e9f31c055bdcf4f/supervision/dataset/formats/yolo.py#L170-L173

Next, formats like TIFF may have image layers. We probably want the merged image in those cases. I don't know the peculiarities of other models.

That's true. We may also have to assert image.shape[2] == 3. Does supervision/ultralytics work with images with more or less than three channels? I checked that in the case of a multi-image TIFF file, from PIL import Image only loads the first image, which, in some way, takes away the burden of handling it at our end.

from PIL import Image

# Create and save a multi-image TIFF
img1 = Image.new('RGB', (111, 222), color='red')    # 100x100
img2 = Image.new('RGB', (333, 444), color='green')  # 200x150
img3 = Image.new('RGB', (555, 666), color='blue')   # 300x100
img1.save('/tmp/multi_image.tif', save_all=True, append_images=[img2, img3])

# Read the multi-image TIFF
img = Image.open('/tmp/multi_image.tif')
print(img.size)
# Output: (111, 222)

Also, if an image is corrupt, the current code will generate the following error which is uninformative.

---> h, w, _ = image.shape

AttributeError: 'NoneType' object has no attribute 'shape'

Overall, if we can replace cv2 with something else that raises an error on failure, we may lift the image format checker and let the image loader raise the error. If we need to use cv2, it seems challenging to raise a meaningful error with the user about why the image loading was unsuccessful.

Sep 30 '24 08:09 patel-zeel

cv2 loads the image as a numpy array, which makes all subsequent computation easier. I'd strongly prefer to stick with cv2 and avoid conversions from PIL every time.

Supervision uses transparent images very infrequently, e.g. when loading images for the IconAnnotator. If possible, it would be great to keep the channel size flexible. However, if you spot cv2.imread with no extra parameters, it means the image is 2d or 3d. cv2.IMREAD_UNCHANGED param is required to read transparent images.

I've tested it - calling an object detection model in ultralytics on an image with transparency raises a runtime error.

RuntimeError: Given groups=1, weight of size [16, 3, 3, 3], expected input[1, 4, 640, 640] to have 3 channels, but got 4 channels instead

Also, if an image is corrupt, the current code will generate the following error which is uninformative.

Agreed. After most imreads, we should check if the image is None and raise a ValueError. Because of opencv's popularity and convenience, I'd still prefer to stick to it, but we can raise ValueError("Failed to read image") for an earlier warning

That's my line of thinking - hopefully it clears things up a little.

Sep 30 '24 09:09 LinasKo

cv2 loads the image as a numpy array, which makes all subsequent computation easier. I'd strongly prefer to stick with cv2 and avoid conversions from PIL every time.

I did a small benchmarking around this in this colab. PIL seems a lot faster than cv2 when checking the shape without loading the data. cv2 is faster than PIL when loading the numpy data.

Time (Min - Max) taken (in seconds) to load 500 images (size: 640 x 640) of each type:

Image type	`cv2` load as numpy	`PIL` shape check only	`PIL` load as `numpy`
PNG	2.58 - 3.91	0.05 - 0.07	3.49 - 4.77
JPG	4.52 - 5.65	0.07 - 0.12	5.41 - 6.63
TIF	5.27 - 6.64	0.22 - 0.34	11.15 - 11.54

I've tested it - calling an object detection model in ultralytics on an image with transparency raises a runtime error.

Okay, in that case, shall we assert image.shape[2] == 3?

Agreed. After most imreads, we should check if the image is None and raise a ValueError. Because of opencv's popularity and convenience, I'd still prefer to stick to it, but we can raise ValueError("Failed to read image") for an earlier warning

Do the following changes seem reasonable to you?

If we go with cv2:

[x] if image is None, we raise ValueError(f"Failed to read image from '{image_path}'").

In any case:

[x] We remove type restriction by extension and allow any extension by default.
[x] We assert image.shape[2] == 3

Sep 30 '24 11:09 patel-zeel

Interesting details about the shape check. Thanks for carrying out the speed analysis!

I'm happy for us to change the shape check to use PIL in yolo dataset case, as long as we leave a short comment explaining why Pil and not cv2 is used there. In most other places, I'd stick to cv2 for consistency, even if it's slower.

Do the following changes seem reasonable to you?

"Yes" to everything, but I'd steer us towards raising errors rather than using assert. While I personally like assert statements, our standard in this repo is to opt for ValueError instead. I reserve assertions only for assumption checks that we never expect users to see. (ValueError = caller made the mistake, Assertion = us developers made the mistake)

Oct 01 '24 12:10 LinasKo

Marking this issue as eligible for Hacktoberfest. It aligns very well the theme we'd like to have, focusing on reliability and feature completeness of supervision.

Oct 03 '24 10:10 LinasKo

@LinasKo, feel free to unassign me to open this for the community. I'm in a bit of deadline mode and I'd love to see more people contributing to this repo.

Oct 03 '24 14:10 patel-zeel

@patel-zeel thank you for the research and comments you made so far. I am un-assigning from you

Oct 03 '24 14:10 onuralpszr

@onuralpszr, no, that's too hasty of a decision.

This issue starts from simple foundations but contains a lot of detail, discussions. It can be solved in different ways and at different depths. It is not specific enough for the general community.

I want it to stay Patel's domain as he's proven to have a thoughtful approach and handle both the ambiguity & edge cases.

@patel-zeel, I'll leave this assigned to you - take as much time as you need! If you don't find a good occasion to solve it by November, I'll be happy to take over 😉

Oct 03 '24 17:10 LinasKo

@LinasKo and @onuralpszr, #1636 is now merged. I think now this issue can be closed as completed.

Jan 19 '25 09:01 patel-zeel

Fixed by #1636

Jan 19 '25 13:01 onuralpszr

supervision
supervision copied to clipboard

Allow TIFF (and more) image formats in `load_yolo_annotations`

Search before asking

Description

Proposals

Use case

Additional

Are you willing to submit a PR?

supervision supervision copied to clipboard

Allow TIFF (and more) image formats in `load_yolo_annotations`

Search before asking

Description

Proposals

Use case

Additional

Are you willing to submit a PR?

supervision
supervision copied to clipboard