gpytorch
gpytorch copied to clipboard
[Feature Request] Student T Processes
🚀 Feature Request
Motivation
Student T Processes are the other member of the family of elliptical processes along with Gaussian Processes. In some situations these have preferable statistical properties.
Student-t Processes as Alternatives to Gaussian Processes
Pitch
Adding this to GPytorch would enable users to work with a new family of processes which will be useful for many people. I would like to invite suggestions in how to implement this in the code base before working on the pull request. For example, should all GPs be made a subclass of a new elliptical processes class, or should Student T Processes be written in a fashion which would cause minimal changes to other aspects of GPytorch code?
Are you willing to open a pull request?
Absolutely willing to work on this problem.
I personally think this would be pretty cool to support. My hunch is that there will be a ton of code reuse at the LazyTensor
level and lower since they both just involve a lot of manipulation of positive definite matrices, but not much code reuse at the ExactGP/ApproximateGP
or the PredictionStrategy
level. This is probably good news overall, since the linear algebra is already there and reasonably well implemented via LazyTensor
, and it's just a matter of building the model on top of that.
Basically I would think we'd have an analog of the GP
class, and then of ExactGP
and maybe ApproximateGP
. I think that would be the best balance of still having significant code reuse in terms of the linear algebra operations (all models use LazyTensor
to represent positive definite matrices), but is separated enough for the GP side of things that it'll be less likely to cause new problems there or become too hacky.
There are also some interesting questions, like what sparse / variational / deep versions of the model looks like. I've done exactly 0 reading on the topic, so some of those questions may already have answers but I certainly haven't (personally) seen them widely explored.
At the prediction strategy level, this may not be that difficult to implement as the conditional multivariate t distribution ends up having all of the same components as the multivariate Gaussian. The difference is a couple of rescaling terms that would need to be separately computed:
(from https://arxiv.org/pdf/1402.4306.pdf)
just $n_1$ (which is just an attribute) and $\beta_1$. I'd assume training would be straightforward via the multivariate t log likelihood with the only difference being that the likelihood term for the Gaussian setting would need to be worked in beforehand.
Happy to discuss in more detail / put up a quick implementation if that would be helpful.
I have started working on a pull request here. It is still a work in progress, have only added a Multivariate Student T distribution at this point, but will continue working on this concept there.