gan
gan copied to clipboard
CelebA dataset download/import issue, a workaround
First of all, I really enjoy reading "Make Your First GAN with PyTorch." The book explains the idea behind GAN and its training clearly. A wonderful book for anyone who is interested in this subject.
I'm creating this ticket because I was having issue accessing the CelebA dataset following the book's instructions. The issue seems to be caused by the imageio
package. In particular, I got the following error:
`ValueError: Could not find a backend to open `/Users/cchang/Projects/torch/celeba_data/__MACOSX/img_align_celeba/._052628.jpg`` with iomode `ri`.
Based on the extension, the following plugins might add capable backends:
pyav: pip install imageio[pyav]
opencv: pip install imageio[opencv]`
I installed the plugins as instructed, then this came up:
`ValueError: Could not find a backend to open `xxx.jpg` with iomode `ri`.`
Anyhow, in order to proceed, I decided to download the CelebA dataset from the source (either from the link provided in the book or from kaggle), and created the h5py file as follows:
import matplotlib.image as mpimg
hdf5_file = 'celeba_aligned_small.h5py'
with h5py.File(hdf5_file, 'w') as hf:
for i in range(1, 20000):
img = mpimg.imread('./img_align_celeba/{0:06d}.jpg'.format(i))
hf.create_dataset('img_align_celeba/{0:06d}.jpg'.format(i),
data=img,
compression="gzip",
compression_opts=9)
if (i%1000 == 0):
print("images done .. ", count)
pass
Note that
- I used
matplotlib
's image class for reading the CelebA image files. There are many other options available for this. - I followed CelebA's file naming convention. As a result, the book's
CelebADataset()
class needs a little adjustment for the convention.
Hope this can be of a little help for those who also encountered a similar issue.