beginners-pytorch-deep-learning
beginners-pytorch-deep-learning copied to clipboard
Chapter 2: Imagefolder passes only filename without path to `is_valid_file`
When running the example notebook for chapter 2, the call of ImageFolder
for train, val and test data uses the keyword argument is_valid_file=check_img
. ImageFolder
passes as argument path
to check_img(path)
only the filenames without the relative pathes like ../cat/
or ../fish/
and therefore the test in check_img
always throws an exception. I start the notebook in ./
and the keyword argument root="./train/"
etc. seems to allow ImageFolder
to correctly find the images in the two classes cat and fish, as the following final error message of the call of ImageFolder
suggests: FileNotFoundError: Found no valid file for the classes cat, fish.
. But path
contains only the filenames of the images without the relative path w.r.t. ./
(or root
).
The notebook runs without any other issues when is_valid_file=check_img
is removed from the call of ImageFolder
. I wonder whether I missunderstood something trivial or wether this is a bug. (I am using python v3.8.10, pytorch v1.10.0, torchvision v0.11.1)
Thank you for mentioning! Just as a hint - it's always really helpful to share the stack trace when encountering an error. :)
Stack trace
Running Chapter 2.ipynb
on Google Colab with Python 3.7.12, torch 1.10.0, torchvision 0.11.1:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-9-0f78c4f1439d> in <module>()
1 train_data_path = "/content/drive/My Drive/Colab Notebooks/beginners-pytorch-deep-learning/chapter2/train/"
----> 2 train_data = torchvision.datasets.ImageFolder(root=train_data_path,transform=img_transforms, is_valid_file=check_image)
3 frames
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in __init__(self, root, transform, target_transform, loader, is_valid_file)
311 transform=transform,
312 target_transform=target_transform,
--> 313 is_valid_file=is_valid_file)
314 self.imgs = self.samples
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in __init__(self, root, loader, extensions, transform, target_transform, is_valid_file)
144 target_transform=target_transform)
145 classes, class_to_idx = self.find_classes(self.root)
--> 146 samples = self.make_dataset(self.root, class_to_idx, extensions, is_valid_file)
147
148 self.loader = loader
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in make_dataset(directory, class_to_idx, extensions, is_valid_file)
190 "The class_to_idx parameter cannot be None."
191 )
--> 192 return make_dataset(directory, class_to_idx, extensions=extensions, is_valid_file=is_valid_file)
193
194 def find_classes(self, directory: str) -> Tuple[List[str], Dict[str, int]]:
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in make_dataset(directory, class_to_idx, extensions, is_valid_file)
100 if extensions is not None:
101 msg += f"Supported extensions are: {', '.join(extensions)}"
--> 102 raise FileNotFoundError(msg)
103
104 return instances
FileNotFoundError: Found no valid file for the classes cat, fish.
Reason for exception
It seems that the reason is a commit that changed the path object in folder.py
in the torchvision package which leads to a different behavior with regard to our check_image
function (https://github.com/pytorch/vision/commit/9b29f3f22783112406d9c1a6db47165a297c3942#diff-5d4aa766c846fb9465acd2f311d8f397f98f3baeb8c0b4f04d8a26863d9dd8e3).
It's about these two lines:
path = os.path.join(root, fname)
if is_valid_file(path):
which changed to
if is_valid_file(fname):
path = os.path.join(root, fname)
Therefore the if is_valid_file(fname)
check currently fails - since fname is a file name rather than a path object - and the whole code after this if statement is not executed, which in turn leads to the fact that the files cannot be found.
Current & future solutions
The current solution is to downgrade torchvision to 0.10.1 and torch to 1.9.1 (e. g. in Google colab !pip install torchvision==0.10.1
) - or, as you did, leave out the check_image()
function (but I would not recommend, as I think this would only work properly for those having downloaded the data through the data folder on Gdrive and not by the download.py
, as there are (at least were) downloaded some files which are broken and thus need a check)
With the next upgrade of torchvision the code as is should work again since there's a merged PR in the current branch which should resolve this issue (https://github.com/pytorch/vision/pull/4885).
Thank you for the fast and detailed explanations and the hint with the stack trace for the next time. I am new to pytorch and was not sure if I used it correctly. While looking for the error I also came across folder.py
calling is_valid_file
only with argument fname
. It occured like a bug to me, because in principle only ImageFolder
knows the relative path of the class the file belongs to.