vision icon indicating copy to clipboard operation
vision copied to clipboard

Torchvision bounding boxes do not match the images, becuase the bboxes are from the pre-cropped, pre-resized version.

Open yaoshiang opened this issue 8 months ago • 4 comments

🐛 Describe the bug

CelebA bounding boxes were calculated on the so called "in-the-wild" images, prior to cropping and resizing. But torchvision.datasets returns the version that is cropped to 178x218. So for example, on the ninth image, the bbox is outside the image size.

CODE TO REPRO

from torchvision import datasets

celeba = datasets.CelebA(root="./celeba", target_type="bbox", download=True, split="train")

print(celeba[8])

(<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=178x218>, tensor([600, 274, 343, 475]))

Versions

collect_env.py crashed on me but here's the version:

Using Python 3.12.8 environment at: XXX
Name: torchvision
Version: 0.21.0
Location: XXX
Requires: numpy, pillow, torch
Required-by:

yaoshiang avatar Mar 27 '25 17:03 yaoshiang

Thanks for the report @yaoshiang . Happy to consider a fix PR

NicolasHug avatar Apr 01 '25 10:04 NicolasHug

I'll look into it. If I'm able to generate valid boxes, how do you recommend I store them? I could do what we did with CelebA and put them on a public google drive and use gdown to get them. Or store them as a pkl/txt file in the source folder and load via importlib.

yaoshiang avatar Apr 08 '25 20:04 yaoshiang

Oh, maybe I misunderstood what the original problem is. I was assuming torchvision wasn't downloading the proper file. But it looks like it's the dataset original files themselves that are incorrect? If that's the case, I don't think we should be uploading and vendoring validated boxes, we should probably stick to the original dataset.

NicolasHug avatar Apr 09 '25 14:04 NicolasHug

The issue here is that there are actually two sets of images: uncropped "in the wild" and cropped. The identity and binary attributes obviously don't change. For landmarks, there are actually landmarks for both sets of images. the boxes are only for the uncropped images.

The reason the original images are not accessible through this layer is that they were compressed using 7z, and the comment in the source says that it was not obvious how to handle this format.

https://github.com/pytorch/vision/blob/95f10a4ec9e43b2c8072ae5a68edd5700f9b1e45/torchvision/datasets/celeba.py#L49

As a workaround, we can actually use both sets of landmarks to impute the crop and resize via analytical linear regression, and apply it to the boxes. I'm working on that here. https://github.com/yaoshiang/celeba-boxes/blob/main/eda.ipynb

yaoshiang avatar Apr 09 '25 14:04 yaoshiang