No Error Handling in Batch Prediction for Corrupted Images
Search before asking
- [x] I have searched the Pytorch-Wildlife issues and found no similar bug report.
Description
When running batch predictions using MegaDetectorV6, if the batch contains a corrupted image, the entire process fails due to PIL.UnidentifiedImageError instead of skipping the invalid images and continuing with valid ones.
Initialization
from PIL import Image, UnidentifiedImageError
import numpy as np
import os
import torch
from PytorchWildlife.models import detection as pw_detection
def is_invalid_image(file_path):
"""
Checks if an image file is invalid using PIL.
Args:
file_path (str): The absolute path to the image file.
Returns:
str or None: Returns file_path if invalid, otherwise None.
"""
try:
with Image.open(file_path) as img:
img.verify() # Verify without fully loading
return None # Valid image
except (UnidentifiedImageError, OSError):
return file_path # Invalid image
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
if DEVICE == "cuda":
torch.cuda.set_device(0)
detection_model = pw_detection.MegaDetectorV6(device=DEVICE, pretrained=True, version="MDV6-yolov10-e")
Use the script below to generate 5 valid images and 1 corrupted image.
tgt_folder_path = "test_images"
os.makedirs(tgt_folder_path, exist_ok=True)
# Generate 5 valid images
for i in range(5):
img = Image.fromarray(np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)) # 100x100 random image
img.save(os.path.join(tgt_folder_path, f"valid_{i}.jpg"), "JPEG")
# Generate 1 corrupted image
corrupted_file_path = os.path.join(tgt_folder_path, "corrupted.jpg")
with open(corrupted_file_path, "wb") as f:
f.write(b"\x00\x01\x02\x03\x04") # Write random bytes to corrupt the file
# Check images in the directory
test_dir = tgt_folder_path
invalid_images = [is_invalid_image(os.path.join(test_dir, file)) for file in os.listdir(test_dir)]
invalid_images = [img for img in invalid_images if img] # Filter out None values
print("Invalid images detected:", invalid_images)
Pass the folder that contains corrupted images to MegaDetectorV6 batch prediction, expecting PIL.UnidentifiedImageError.
try:
results = detection_model.batch_image_detection(tgt_folder_path)
except UnidentifiedImageError as e:
print(f"Error: Unidentified image file encountered. {e}")
except Exception as e:
print(f"Unexpected error occurred: {e}")
Error: Unidentified image file encountered. cannot identify image file 'test_images/corrupted.jpg'
Remove corrupted images from target folder, successfully completed the inference.
os.remove(
os.path.join(tgt_folder_path, "corrupted.jpg")
)
try:
results = detection_model.batch_image_detection(tgt_folder_path)
except UnidentifiedImageError as e:
print(f"Error: Unidentified image file encountered. {e}")
except Exception as e:
print(f"Unexpected error occurred: {e}")
0: 640x640 (no detections), 13.2ms
1: 640x640 (no detections), 13.2ms
2: 640x640 (no detections), 13.2ms
3: 640x640 (no detections), 13.2ms
4: 640x640 (no detections), 13.2ms
Speed: 11.2ms preprocess, 13.2ms inference, 9.7ms postprocess per image at shape (5, 3, 640, 640)
100%|██████████| 1/1 [00:00<00:00, 2.39it/s]
Use case
The current workaround I have is pre-validating all images with PIL before passing them to batch_image_detection, but this can be a costly process, especially for large datasets. It would be great to have an error handling mechanism:
- Collect failures instead of raising an error
- Return results for valid images + a list of failed images
pred_result, failed_images_path = detection_model.batch_image_detection(tgt_folder_path)
This way, invalid images won’t crash the batch, and users can handle failures separately.
I’d be happy to help if you think it’s a good addition! Appreciate any guidance on how best to contribute!
Additional
No response
Are you willing to submit a PR?
- [ ] Yes I'd like to help by submitting a PR!