parmesan
parmesan copied to clipboard
Wrong gradients in NormalizingPlanarFlowLayer
If I understand correctly, equation 11 in the paper is computed here, where for a batch of 5 and with 3 features, h'(w^t + b) should have a shape of (5,) and w of (3,), thus psi should be a (5, 3), and psi_u (5,). However, in the current implementation psi is (5,) and psi_u is a scalar. So the solution would be the change the dot product for a element-wise product. Is that right or did I make a mistake?