celerite2 Design interface for missing data in Kronecker models

@tagordon, @ericagol: I have a proposal.

I expect that a common use case would be something like Rubin where you won't ever have multiple bands observed simultaneously, therefore there's a big overhead introduced by building the full matrices and then masking. So:

For the low rank terms, I think that this could be most efficiently implemented by just allowing the model to have variable kernel amplitudes. I would implement this by allowing some NxJ matrix Alpha which you would multiply into U and V and then square, sum along the 2nd axis and then multiply into a. I think that this would be equivalent to the low rank Kronecker model with missing data.
For the dense version, I think that things will be a bit trickier and I'm not sure what the best interface is. I think it would be worth working this through carefully and honestly I think that it might be worth writing a paper. It looks to me like we might be able to come up with a pretty efficient algorithm for this and we'd probably have a lot of users!

Oct 15 '20 21:10 dfm

@dfm I'm trying to understand your suggestion for the low-rank version. What determines the entries in Alpha?

Oct 15 '20 22:10 tagordon

Sorry - I wrote it pretty vaguely!

So if I understand @ericagol's proposal for the missing data algorithm, we just need to remove all the rows of a, U, and V that correspond to the missing data (P is actually trickier in that form - more on that later!). So if you only observe one band at each time, this would be equivalent to just multiplying each row of U and V by the alpha value for the band observed at that time and multiplying that row of a by the square (modulo the diag entries, but you know what I mean!). So in its simplest form, Alpha would be an N-vector where each entry is the alpha parameter for the band at time n. The reason why I said NxJ is that if you add KronTerms that is equivalent to allowing different alphas for different Js and we might want to support that.

Is this at all clearer? Hard to explain without a whiteboard perhaps when it's all a little muddled in my brain.

Also: the reason why the P matrix is harder when masking is because most of the rows of P are 1, I think, but we need the row corresponding to the first observed band of P to be evaluated as exp(-c (t_n - t_{n-1})) (currently it's the first band, which will change if that band gets masked). This is trivial in the "one band per time" case as described above.

Oct 16 '20 11:10 dfm

@dfm Perhaps we should meet about this; I haven't thought about this much, and so could use an explainer.

Oct 22 '20 13:10 ericagol

Sure that would be great! Next week is getting a bit overloaded already. Perhaps some time on Nov 3? Y'all vote by mail in Washington right?

Oct 23 '20 11:10 dfm

@dfm Voted!

Oct 23 '20 16:10 ericagol

Election day works for me!

Oct 23 '20 17:10 tagordon