Wasserstein distance between prior and posterior
Hi,
I'm trying to find the part of the code that attempts to compute the Wasserstein distance between prior and posterior (as in Eq. 5 in your ICLR paper), but couldn't find it. Would you please point to the part of the code for this distance?
Moreover, I found that the latent variables are computed directly from the model (e.g., a fully connected layer) rather than predicting \mu and \sigma and then sampling from that distribution, as stated in Eq. 3 and Eq. 4. Would you please clarify this?
Thanks
The Wasserstein distance is implicitly minimized by training a WGAN. The latent variable is implicitly sampled with the reparametrize trick.
If the latent variable is sampled with the reparam trick, it should be a mu and sigma and epsilon, where latent = mu + sigma \odot epsilon
But, as far as I understood, your model directly generates a latent variable. Is that correct?
https://github.com/guxd/DialogWAE/blob/29f206af05bfe5fe28fec4448e208310a7c9258d/modules.py#L186