stat453-deep-learning-ss21 icon indicating copy to clipboard operation
stat453-deep-learning-ss21 copied to clipboard

Unable to load CelebA dataset. File is not zip file error.

Open Hackathorn opened this issue 4 years ago • 6 comments

More of a FYI... Tried to reproduce L17 4_VAE_celeba-inspect notebook. When loading dataset, got ERROR "Unable to load CelebA dataset. File is not zip file error" with "BadZipFile: File is not a zip file". Found TorchVision Issue #2262 that identified problem as exceeding daily max quote on GoogleDrive, punted issue back to dataset authors, and closed their issue. A future version of TorchVision should give a better descriptive error message.

So, FYI to your students. Work-around is to...

Hackathorn avatar Aug 07 '21 17:08 Hackathorn

Thanks for the note, Richard, and I agree, this is definitely frustrating. I was recently teaching a GAN tutorial and had similar issues. Downloading the dataset from the original website can be a bit tedious because it involves several steps. So, for this tutorial, I gathered the relevant files and uploaded it as a zip file to my Google Drive.

In case it's useful, it's 1.7 Gb and you only need to unzip it in the current notebook directory (or rather the directory the dataset/dataloader points to): https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=sharing

rasbt avatar Aug 07 '21 17:08 rasbt

Download from your Google Drive and extracted/replace into L17/data folder was simple and worked great.

Hackathorn avatar Aug 07 '21 18:08 Hackathorn

I have the same issue but even after downloading from your link, I get an error from the _check_integrity() function saying that Dataset not found or corrupted. You can use download=True to download it.

AntixK avatar Dec 22 '21 06:12 AntixK

Have you checked that all the files are non 0 kb? If download=True it may try to overwrite existing files such that they become empty files. If I have the files as shown below it seems to work (tried it the other day, see https://github.com/rasbt/machine-learning-book/blob/main/ch12/ch12_part1.ipynb)

Unknown

rasbt avatar Dec 22 '21 14:12 rasbt

I did set download=False after downloading the files manually and checked their size as well. I figured the problem was with the checkintegrity function where it returns False.

So, I wrote a simple workaround to resolve it

class MyCelebA(CelebA):
    """
    A work-around to address issues with pytorch's celebA dataset class.
    
    Download and Extract
    URL : https://drive.google.com/file/d/1m8-EBPgi5MRubrm6iQjafK2QMHDBMSfJ/view?usp=sharing
    """
    
    def _check_integrity(self) -> bool:
        return True

AntixK avatar Dec 23 '21 22:12 AntixK

Thanks for sharing!

rasbt avatar Dec 27 '21 17:12 rasbt