No Error Handling in Batch Prediction for Corrupted Images

Open NetZissou opened this issue 9 months ago • 0 comments

Search before asking

[x] I have searched the Pytorch-Wildlife issues and found no similar bug report.

Description

When running batch predictions using MegaDetectorV6, if the batch contains a corrupted image, the entire process fails due to PIL.UnidentifiedImageError instead of skipping the invalid images and continuing with valid ones.

Initialization

from PIL import Image, UnidentifiedImageError
import numpy as np
import os

import torch
from PytorchWildlife.models import detection as pw_detection

def is_invalid_image(file_path):
    """
    Checks if an image file is invalid using PIL.
    
    Args:
        file_path (str): The absolute path to the image file.

    Returns:
        str or None: Returns file_path if invalid, otherwise None.
    """
    try:
        with Image.open(file_path) as img:
            img.verify()  # Verify without fully loading
        return None  # Valid image
    except (UnidentifiedImageError, OSError):
        return file_path  # Invalid image

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
if DEVICE == "cuda":
    torch.cuda.set_device(0)

detection_model = pw_detection.MegaDetectorV6(device=DEVICE, pretrained=True, version="MDV6-yolov10-e")

Use the script below to generate 5 valid images and 1 corrupted image.

tgt_folder_path = "test_images"
os.makedirs(tgt_folder_path, exist_ok=True)

# Generate 5 valid images
for i in range(5):
    img = Image.fromarray(np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8))  # 100x100 random image
    img.save(os.path.join(tgt_folder_path, f"valid_{i}.jpg"), "JPEG")

# Generate 1 corrupted image
corrupted_file_path = os.path.join(tgt_folder_path, "corrupted.jpg")
with open(corrupted_file_path, "wb") as f:
    f.write(b"\x00\x01\x02\x03\x04")  # Write random bytes to corrupt the file

# Check images in the directory
test_dir = tgt_folder_path

invalid_images = [is_invalid_image(os.path.join(test_dir, file)) for file in os.listdir(test_dir)]
invalid_images = [img for img in invalid_images if img]  # Filter out None values

print("Invalid images detected:", invalid_images)

Pass the folder that contains corrupted images to MegaDetectorV6 batch prediction, expecting PIL.UnidentifiedImageError.

try:
    results = detection_model.batch_image_detection(tgt_folder_path)
except UnidentifiedImageError as e:
    print(f"Error: Unidentified image file encountered. {e}")
except Exception as e:
    print(f"Unexpected error occurred: {e}")

Error: Unidentified image file encountered. cannot identify image file 'test_images/corrupted.jpg'

Remove corrupted images from target folder, successfully completed the inference.

os.remove(
    os.path.join(tgt_folder_path, "corrupted.jpg")
)

try:
    results = detection_model.batch_image_detection(tgt_folder_path)
except UnidentifiedImageError as e:
    print(f"Error: Unidentified image file encountered. {e}")
except Exception as e:
    print(f"Unexpected error occurred: {e}")

0: 640x640 (no detections), 13.2ms
1: 640x640 (no detections), 13.2ms
2: 640x640 (no detections), 13.2ms
3: 640x640 (no detections), 13.2ms
4: 640x640 (no detections), 13.2ms
Speed: 11.2ms preprocess, 13.2ms inference, 9.7ms postprocess per image at shape (5, 3, 640, 640)
100%|██████████| 1/1 [00:00<00:00,  2.39it/s]

Use case

The current workaround I have is pre-validating all images with PIL before passing them to batch_image_detection, but this can be a costly process, especially for large datasets. It would be great to have an error handling mechanism:

Collect failures instead of raising an error
Return results for valid images + a list of failed images

pred_result, failed_images_path = detection_model.batch_image_detection(tgt_folder_path)

This way, invalid images won’t crash the batch, and users can handle failures separately.

I’d be happy to help if you think it’s a good addition! Appreciate any guidance on how best to contribute!

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

Mar 09 '25 17:03 NetZissou