keras-mdn-layer
keras-mdn-layer copied to clipboard
Check treatment of scale matrix vs covariance matrix in sampling procedure
There could be an issue with sampling due to (my) confusion about standard deviation and variance.
The samples are drawn using numpy like so (documentation) (line 238 of __init__.py)
sample = np.random.multivariate_normal(mus_vector, cov_matrix, 1)
But the output from the mixture density layer are treated as scale variables in tfp.distributions.MultivariateNormalDiag. This notes that:
covariance = scale @ scale.T
Thus, it seems we should have been squaring the cov_matrix before putting it into the multivariate normal sampling procedure. This could explain why we end up having to scale down the sigma variable so much in real-world applications.
A todo here is to get a definite answer and do some test to try out what's going on.
it seems to me that the scale vector should have been squared before using as a covariance matrix, so this is now the current behaviour.
It remains to write a test (going across tensorflow probability and numpy) that a tfd scale vector is actually going to produce the correct distributions.