Variational-NMT
Variational-NMT copied to clipboard
KL loss function
Hello, I am confused about the KL loss function in the training process. In your paper VRNMT, you have mentioned that the z_j is integrated into the decoder network, did you mean the z_j in decoder derives from the prior network? Which makes me very confused. Look forward to your early reply.