GLM.jl icon indicating copy to clipboard operation
GLM.jl copied to clipboard

Path towards GLMs with fweights, pweights, and aweights

Open jeffwong opened this issue 7 years ago • 4 comments

This ticket is related to this discourse discussion

From what I can tell

  1. We need to expand LmResp and GLMResp to not only contain wts, but also contain fweights, pweights, and aweights. We should allow a user to pass in multiple types of weights like both fweights and pweights.

  2. We will need to have a function that consolidates fweights, pweights, and aweights into a single vector that can be used in a more classic setting like weighted maximum likelihood. In MLE, only the relative weighting of the weights matter, not the absolute value. If someone is combining fweights and pweights I think this can be consolidated through wts = fweights * pweights. Second opinion welcome here!

  3. The MLE functions already accommodate a weight vector, so nothing needs to be done here if 2) is done

  4. The nobs function currently returns the sum of wts, if present. This is the right behavior for fweights. For pweights and aweights, I think we need to return the number of rows in the LinPred object.

  5. We will need to adapt the vcov function to return the covariance matrix according to the types of weights used. Here is a reference

jeffwong avatar Jul 15 '17 18:07 jeffwong

As a reference to other readers, see also https://github.com/JuliaStats/StatsBase.jl/issues/283

ararslan avatar Jul 15 '17 19:07 ararslan

Thanks for writing this. I agree in general. Just a few remarks:

We need to expand LmResp and GLMResp to not only contain wts, but also contain fweights, pweights, and aweights. We should allow a user to pass in multiple types of weights like both fweights and pweights.

Do we actually need to store the three vectors? Maybe we could only store one vector, after adjusting it to take into account the other (if they were specified), as in point 2?

The nobs function currently returns the sum of wts, if present. This is the right behavior for fweights. For pweights and aweights, I think we need to return the number of rows in the LinPred object.

More precisely, I think that should be the number of nonzero weights.

nalimilan avatar Jul 19 '17 14:07 nalimilan

  1. Yes that is a really good point. I suppose there aren't many use cases where it is useful to extract out the original fweight, pweights, and aweights. If anyone thinks of a scenario where these are needed leave a note

  2. Yes I think you are right

I have parts 1-4 done, just trying to figure out exactly how the vcov function works

jeffwong avatar Jul 22 '17 05:07 jeffwong

What about rather than passing a linear predictor and weights, we just return the weighted model matrix? If I am not mistaken, it should just make it simpler for vcov.

Nosferican avatar Dec 13 '17 05:12 Nosferican