gpytorch icon indicating copy to clipboard operation
gpytorch copied to clipboard

[Feature Request] Multitask GP regression with missing outputs

Open ABesginow opened this issue 3 years ago • 2 comments

🚀 Feature Request

Being able to train a multitask GP regression where y-data is missing (either completely or partially) for some of the channels.

Motivation

I have the case that sometimes there is missing data in a channel, e.g. I have a timeseries with 100 entries and 2D outputs (e.g. 2 sensors) and one sensor is sampling half as often as the other. Then I have y data with 100 entries, of which 50 are either nan or None for the 2nd dimension.

This is similar to the feature in #1591, but instead of a single output dimension, this would make use of multitask kernels & GPs. (This was even discussed/requested in the comments of the #1591 issue see https://github.com/cornellius-gp/gpytorch/issues/1591#issuecomment-903143677)

Pitch

Describe the solution you'd like

For the implementation: In theory, this should just be possible by striking out the relevant (i.e. nan/None) rows/columns of the resulting covariance matrix where the data is missing, but I don't know if the current implementation is able to handle it that easily.

For usage: It will probably make sense to also be a likelihood similar to the implementation of #1591.

Describe alternatives you've considered

Are you willing to open a pull request? (We LOVE contributions!!!) In case nobody has finished this feature until I have time to take care of it I'd love to make PR for it. Probably have some time for it at the start of 2022. (But I'd need someone to point me to resources for testing/contribution requirements etc. used in GPyTorch pls :) )

Additional context

ABesginow avatar Oct 20 '21 13:10 ABesginow

Couldn't this be done by reshaping the two output dimensions and adding in an output flag to your inputs describing which output they come from? Your kernel would then probably be something like RBFKernel(active_dims=[0]) * IndexKernel(active_dims[1]) Finally, you could directly use the MissingDataLikelihood on your structured dataset.

wjmaddox avatar Oct 20 '21 14:10 wjmaddox

@wjmaddox What about cases where we rely on LMCVariationalStrategy?

mochar avatar Dec 21 '21 16:12 mochar