probability
probability copied to clipboard
Variational Gaussian process loss- possible math error
In the documentation for variational Gaussian process applied to minibatches (https://github.com/tensorflow/probability/blob/v0.12.1/tensorflow_probability/python/distributions/variational_gaussian_process.py#L572), the KL term is rescaled by batch_size/num_training_points_. I assume the reconstruction error term (expected log-likelihood) is not scaled and represents a sum over all data points in the minibatch. My understanding is the unbiased estimator for the full-data variational loss should instead be given -(num_training_points_/batch_size)*reconstruction_error + KL_term (or, on a per-observation basis, this could be divided by the total observations constant: -(1/batch_size)*reconstruction_error + (1/num_training_points_)*KL_term). Otherwise, if the batch size is not constant across minibatches, the estimator will be biased. If this is true, perhaps the weight should be on the reconstruction error term instead of the KL term in the variational_loss. Please let me know if I am missing something.
I came across this post trying to figure out minibatch reweighting for the general purpose tfp.vi routines. I think you are correct that you would get an incorrect result if you have unequal batch sizes, if you are dividing by the actual batch size. The contribution per observation you want in the elbo should be
$$ \left(\frac{1}{N}D_{KL}(q(\theta|\xi)|P(\theta)) - \mathbb{E}_q \log P(D_n| \theta) \right) $$
so your batch loss should be a partial sum of these terms. If you divide by a fixed batch size, regardless of the actual batch size then you are just rescaling the objective so that should be ok.