vq-vae-2-pytorch icon indicating copy to clipboard operation
vq-vae-2-pytorch copied to clipboard

Training Hypterparameters of PixelSnail for VQ-VAE experiments

Open fostiropoulos opened this issue 5 years ago • 3 comments
trafficstars

I am using 4x Nvidia V100 and I am not able to get a batch size larger than 32 for the hyperparameters of this paper for training on the top codes. I have also changed the loss to discretized mixtures of logistics similar to the actual PixelCNN++ and PixelSnail implementation. The authors mention a batch size of 1024 which seems unreal to reach. Does this implementation of PixelSnail use more layers than the one reported in the VQVAE2 paper?

I am not able to make the mapping between this implementation and the one described in the appendix of VQVAE 2 to correctly configure it to replicate their results. Any help appreciated.

image

fostiropoulos avatar Oct 08 '20 06:10 fostiropoulos

Actually the network used in the paper is much larger than the default model in this implementation.

rosinality avatar Oct 08 '20 12:10 rosinality

Yes I would initially have thought so. I can only think of being able to train such a large model on a TPU. Do you have any insights on how it could have been done?

fostiropoulos avatar Oct 08 '20 14:10 fostiropoulos

Maybe they have used tpus or large amount of gpus. Anyway replicating the model training in the paper will be very hard (actually practically impossible) with a few number of gpus.

rosinality avatar Oct 08 '20 15:10 rosinality