PyTorch-VAE icon indicating copy to clipboard operation
PyTorch-VAE copied to clipboard

Nan in loss function in TC-Beta VAE

Open arijitthegame opened this issue 5 years ago • 2 comments

Hi,

I am running TC-Beta VAE on my data and I changed my architecture to an MLP encoder and Decoder. But I am getting nan in the loss function. And it seems I am getting nans for log_importance_weights, log_q_z and log_prod_q_z. Should I just add an epsilon to each of these quantities before taking log or there is some other issue that I am missing.

arijitthegame avatar Nov 25 '20 18:11 arijitthegame

I too have problem with my InfoVAE implementation. I use kernels of size 7 instead of the original size in the implementation and a latent dim size of 30 instead of 128. My kld_loss is growing very fast from ~2000 to 2*e^35 to infinite to nan in about 5 training steps. I use the kld_weight which is about 1/1400 but the loss does not seem affected. Maybe there is a similar reason for our NaN's?

gaffli avatar Dec 01 '20 10:12 gaffli

I guess may be you can reduce the learning rate to solve this problem.

HaoZhang990127 avatar May 18 '22 16:05 HaoZhang990127