vq-vae-2-pytorch [train_vqvae] multiple gpu seems not work as expected

[train_vqvae] multiple gpu seems not work as expected

Open kimdn opened this issue 5 years ago • 1 comments

trafficstars

Thank you for sharing this great code.

I have used infovae as a subsitute of beta-vae and traditional vae (beta=1). However, I think that your vq-vae-2 better reconstructs images.

Unfortunately, when I used multiple gpus

#SBATCH --gres=gpu:4

python /people/kimd999/script/python/cryoEM/vq-vae-2-pytorch/train_vqvae.py /people/kimd999/MARScryo/dn/data/full/PDX/coexp/input --size 256 --n_gpu 4

it reconstructed image poorly (blank images), and didn't minimize mse that much (mse: 0.01311 after 32 epochs).

While using single gpu

#SBATCH --gres=gpu:1

python /people/kimd999/script/python/cryoEM/vq-vae-2-pytorch/train_vqvae.py /people/kimd999/MARScryo/dn/data/full/PDX/coexp/input --size 256

reconstructed images better (almost as if the input image), and minimized mse better (mse: 0.00583 after 12 epochs).

Consequently, using 1 gpu technically "runs faster" with respect to quality performance although it took more time per epoch (4 hr/ epoch) than 4 gpus' 1.4 hr/ epoch.

I wonder whether you have experienced like this as well?

Aug 27 '20 19:08 kimdn

I didn't saw that kind of the problems. Both distributed or single gpu training results similar results I think.

Aug 28 '20 11:08 rosinality

vq-vae-2-pytorch vq-vae-2-pytorch copied to clipboard

[train_vqvae] multiple gpu seems not work as expected

vq-vae-2-pytorch
vq-vae-2-pytorch copied to clipboard