Training on ImageNet 64x64
Hello,
I am using ImageNet 64x64 and run the code with the following command :
python BigGAN-PyTorch/train.py --dataset I64_hdf5 --parallel --shuffle --num_workers 8 --batch_size 128 --num_G_accumulations 1 --num_D_accumulations 1 --num_D_steps 1--G_lr 1e-4 --D_lr 4e-4 --D_B2 0.999 --G_B2 0.999 --G_attn 32 --D_attn 32 --G_nl relu --D_nl relu --SN_eps 1e-8 --BN_eps 1e-5 --adam_eps 1e-8 --G_ortho 0.0 --G_init xavier --D_init xavier --G_eval_mode --G_ch 32 --D_ch 32 --ema --use_ema --ema_start 2000 --test_every 5000 --save_every 1000 --num_best_copies 5 --num_save_copies 2 --seed 0 --which_best FID --num_iters 200000 --num_epochs 1000 --embedding inceptionv3 --density_measure gaussian --retention_ratio 50
and getting this error:
File "train.py", line 229, in
The interesting thing is when I create a "mini dataset" by randomly selecting 500 images per label from the original ImageNet dataset, code runs fine. What could be the problem? How can I solve this issue?
This is quite strange, I haven't seen this behaviour before. Is it possible that self.embed(y) is receiving values greater than the number of classes in the dataset? That seems to be a particularly common failure case that produces this error.
Otherwise you could try running with the flag CUDA_LAUNCH_BLOCKING=1 (if you haven't already) for a more informative stack trace.