ImageNetV2
ImageNetV2 copied to clipboard
Wrongly labelled when using dataset.ImageFolder
Hello.
I found that for some OS system (my environment is Ubuntu20.04), the class_to_idx property of dataset.ImageFolder is not aligned with the directories' name, so it leads to wrongly label samples.
For instance, the directory 100 (str) is labelled with 2 (int) class. The easiest way to resolve the above issue is, from the dataset.ImageFolder source code (https://pytorch.org/vision/stable/_modules/torchvision/datasets/folder.html#ImageFolder), modifying the line in find_classes function class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)} with class_to_idx = {cls_name: int(cls_name) for cls_name in classes}.
Following @chaeunl, you can use the following dataset class:
class ImageNetV2Folder(datasets.ImageFolder):
def find_classes(self, directory: str) -> Tuple[List[str], Dict[str, int]]:
"""Finds the class folders in a dataset.
See :class:`DatasetFolder` for details.
"""
classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
if not classes:
raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
class_to_idx = {cls_name: int(cls_name) for cls_name in classes}
return classes, class_to_idx
and initialize it with dataset = ImageNetV2Folder(root="imagenetv2-matched-frequency-format-val").
Sanity Check
Then you can check that the class indices point to the correct folder:
index_to_class = {v: k for k, v in dataset.class_to_idx.items()}
for i in range(len(dataset.classes)):
print(f'Class idx {i} corresponds to folder name: {index_to_class[i]}')
# Class idx 0 corresponds to folder name 0
# Class idx 1 corresponds to folder name 1
# Class idx 2 corresponds to folder name 2
# Class idx 3 corresponds to folder name 3
whereas the current implementation in repo (using dataset = torchvision.datasets.ImageFolder(root="imagenetv2-matched-frequency-format-val") produces:
Class idx 0 corresponds to folder name 0
Class idx 1 corresponds to folder name 1
Class idx 2 corresponds to folder name 10
Class idx 3 corresponds to folder name 100
The problem stems from the class folder being named with int strings without preceding zeros.
Hi @psandovalsegura is right ImageFolder is not natively compatible because our class ids are actually uuids. Just use the class in https://github.com/modestyachts/ImageNetV2_pytorch (should be same class as @psandovalsegura shows above)