vq-vae-2-pytorch
vq-vae-2-pytorch copied to clipboard
How to get the dataset?
How to get the dataset that was used to train the examples? What's it's format? (Like the folder hierarchy etc, do they use JPEG or PNG, what are the images called (are they numbered in the image file names etc)?
You can use directory structure and file formats that compatible with torchvision.datasets.ImageFolder. So you need to have directory structures like [DATASET NAME]/[ANY DIRECTORY NAME]/*.[IMAGE EXTENSIONS], that is, you need additional subdirectory. There are no restrictions on file names.
Thank you, but could you please give a concrete example? For example, the images you posted on the README for this repository; where did you get those, and what exactly was the file hierarchy that you chose to use? I want to replicate it as closely as possible
Right now I'm trying to port this to Google Colab, https://colab.research.google.com/drive/1wdBawHzuHqLUEsHj18qvru0dP4Ixu9Jm?usp=sharing But I can't seem to get it to work, and I was wondering if it was because of my file hierarchy
I have used FFHQ(https://github.com/NVlabs/ffhq-dataset) aligned and cropped 1024x1024 images. directory structure is like ffhq/images/*.png, but I have resized it to 256px and converted it to jpg.
You can use directory structure and file formats that compatible with torchvision.datasets.ImageFolder. So you need to have directory structures like [DATASET NAME]/[ANY DIRECTORY NAME]/*.[IMAGE EXTENSIONS], that is, you need additional subdirectory. There are no restrictions on file names.
I ran
python /people/kimd999/script/python/cryoEM/vq-vae-2-pytorch/train_vqvae.py /people/kimd999/MARScryo/dn/data/devel/vq2/data
where /people/kimd999/dn/data/devel/vq2/data
has "PDX_coexp" subfolder
that has "000975.tif" ...
However, it shows
Namespace(dist_url='tcp://127.0.0.1:54975', epoch=560, lr=0.0003, n_gpu=1, path='/people/kimd999/MARScryo/dn/data/devel/vq2/data', sched=None, size=256) epoch: 1; mse: 0.03583; latent: 1.103; avg mse: 0.03583; lr: 0.00030: 0%| | 0/1 [00:01<?, ?it/s] Traceback (most recent call last): File "/people/kimd999/script/python/cryoEM/vq-vae-2-pytorch/train_vqvae.py", line 152, in <module> dist.launch(main, args.n_gpu, 1, 0, args.dist_url, args=(args,)) File "/qfs/people/kimd999/script/python/cryoEM/vq-vae-2-pytorch/distributed/launch.py", line 49, in launch fn(*args) File "/people/kimd999/script/python/cryoEM/vq-vae-2-pytorch/train_vqvae.py", line 125, in main train(i, loader, model, optimizer, scheduler, device) File "/people/kimd999/script/python/cryoEM/vq-vae-2-pytorch/train_vqvae.py", line 73, in train utils.save_image( File "/people/kimd999/bin/Miniconda3-latest-Linux-x86_64/envs/pytorch/lib/python3.8/site-packages/torchvision/utils.py", line 109, in save_image im.save(fp, format=format) File "/people/kimd999/bin/Miniconda3-latest-Linux-x86_64/envs/pytorch/lib/python3.8/site-packages/PIL/Image.py", line 2131, in save fp = builtins.open(filename, "w+b") FileNotFoundError: [Errno 2] No such file or directory: 'sample/00001_00000.png'
Is sample/xxxxx_xxxxx.png format necessary?
Thank you
@kimdn You just need make sample and checkpoint directory like in the repository. Image samples and model checkpoints will be saved in it.
@kimdn You just need make sample and checkpoint directory like in the repository. Image samples and model checkpoints will be saved in it.
When I followed your guide, works (super exciting).
epoch: 289; mse: 0.01040; latent: 0.002; avg mse: 0.01040; lr: 0.00030: 100%|█████████████████████████| 1/1 [00:00<00:00, 3.72it/s] epoch: 290; mse: 0.01040; latent: 0.002; avg mse: 0.01040; lr: 0.00030: 100%|█████████████████████████| 1/1 [00:00<00:00, 4.44it/s]
감사합니다.
@rosinality
- What is the need of saving these files as .jpg and not .png? Is there any specific reason for doing this?
- Also, using transform in train_vqvae.py has resize, do we need to explicitly resize before to 256?
- The checkpoint of VQVAE on FFHQ dataset, has two or three latent maps?
@SURABHI-GUPTA
- jpg will be load faster than png (due to small sizes, decoding steps, ...) You can use pngs.
- Also it is for speed up data loading times. FFHQ is 1024x1024 pngs, so it is quite large to load it during training.
- In the paper authors used 3 latent hierarchies for 1024px. In my implementation I used 2 latent maps for 256px images.
@SURABHI-GUPTA
- jpg will be load faster than png (due to small sizes, decoding steps, ...) You can use pngs.
- Also it is for speed up data loading times. FFHQ is 1024x1024 pngs, so it is quite large to load it during training.
- In the paper authors used 3 latent hierarchies for 1024px. In my implementation I used 2 latent maps for 256px images.
Thank you for your answer.Could you please explain why we need to use "transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])" before loading the image