GLM.jl icon indicating copy to clipboard operation
GLM.jl copied to clipboard

Could the `wts` argument accept a covariance matrix?

Open mkborregaard opened this issue 7 years ago • 8 comments

As discussed with @andreasnoack on Slack, https://julialang.slack.com/archives/C6821M4KE/p1527711673000076 , making it possible to pass a matrix as weights would make it possible to do GLS with the GLM package. Given that weighted regression is simply the special case of GLS where the covariance matrix is Diagonal.

This would make it possible to pass a CovarianceMatrix of expected spatial, temporal or phylogenetic relationships to control for spatial, temporal or phylogenetic autocorrelation. There's a bespoke implementation of this in PhyloNetworks (https://github.com/crsl4/PhyloNetworks.jl/blob/79dc363351dd3953fe18177fc0a1b5003f50676b/src/traits.jl#L1027-L1353) but that's a lot of code and in my opinion it would be much nicer to have the GLS functionality living here and then other packages could just define a way to calculate the covariancematrix.

mkborregaard avatar May 31 '18 11:05 mkborregaard

GEE methods usually choose a type of correlation structure: (1) Independence (I), (2) Exchangeable, (3) AR(p), (4) Toeplitz, or Unstructured. As for the algorithm, that is different from frequency weights. See GEE. However, it would require GLM to keep track of whether the model is mle or not.

Nosferican avatar May 31 '18 13:05 Nosferican

Sorry but I don't see exactly how that would solve the issue.

mkborregaard avatar May 31 '18 14:05 mkborregaard

The correlation structure is not a frequency weights matrix which is the one implemented currently in GLM. You can have both a dependent correlation structure and frequency weights at the same time.

Nosferican avatar May 31 '18 21:05 Nosferican

Oh, so it would need a separate implementation with a different keyword to get it to work?

mkborregaard avatar May 31 '18 21:05 mkborregaard

Yes. It would also differ from a maximum likelihood estimator so the API would change depending on the correlation structure (the default variance covariance estimates would change too). I wouldn't say is trivial; it could eventually find its place in GLM, but It might be best to have the implementation in a different package and port it back afterwards. For example, that's what I am doing with multinomial logistic regression which is a generalization of GLM to vector response generalized linear models (VGLM). Maybe coordinate with @lbittarello for the correlation structure modeling.

Nosferican avatar May 31 '18 22:05 Nosferican

I don't see why in practice we couldn't just Cholesky decompose the passed covariance matrix, transform the data and pass on to GLM, as suggested here: https://julialang.slack.com/archives/C6821M4KE/p1527713945000352 That's also how gls is implemented in R's MASS package.

mkborregaard avatar May 31 '18 22:05 mkborregaard

But I may not see all the complications here. I thought wts were variance weights for weighted regression, but I think you're saying they just represent multiple instances of the same data point?

mkborregaard avatar May 31 '18 22:05 mkborregaard

IRLS which is the algorithm used to fit GLM in this package collapses initial weights (only frequency weights currently supported) with the working weights which are the ones the Wikipedia link has as Ω. If not mistaken, the correlation structure would use the weights initially to compute it so one way would be to allow either wts or the correlation structure. Most software compute the correlation structures as part of the fitting process using either an assumption passed as an argument or a criterion to optimize. See for example Stata's xtgee or SAS's gee.

Nosferican avatar May 31 '18 22:05 Nosferican