GLM.jl
GLM.jl copied to clipboard
Draft of fweights, pweights, aweights for glms
This is a code outline for https://github.com/JuliaStats/GLM.jl/issues/186. I am new to contributing to Julia, so please feel free to leave feedback on how this feature should be implemented
I think if we only supported Frequency Weights and Probability Weights this PR would be a lot easier for me. I am completely unfamiliar with Analytic Weights, and how they affect formulas for covariance of the glm parameters. It would also mean that we could easily simplify the API by only passing in 1 vector of weights
. If someone needed to combine Frequency and Probability together, they could do that using fweight * pweight / sum(fweight * pweight) * sum(fweight)
. This could then be cast as a Frequency Weight: it has the relative weighting that the pweight brings, and the sum of this new vector still sums to sum(fweight)
, protecting the use of nobs
in the package
Retrospectively, by limiting the scope to only Fweights and Pweights, I think we only needed a way to define nobs
as a function of the weight type. Even if someone wanted to combine Fweights and Pweights, it would ultimately still behave like an Fweight. As long as the cov(parameters) was already based on $X' wts X$ instead of $X'X$ the only modification we would need is to nobs
Sure, let's go with the simpler version for now. I don't know of other software accepting multiple types of weights at the same time, and we can always add support for analytic weights later.
@jeffwong about your comment on Analytic Weights:
They represent multiple observations, like fweights, but instead of representing multiple identical obs., they represent distinct observations that were averaged out. This happens with variables such as "average grade by classroom", "accident rate per state", etc. Since groups with more individuals have less noisy means, aweights lead to heteroskedasticity.
That said, on linear models pweights are just "aweights with robust standard errors", so if the code supports pweights then aweights come for free.
What's the status on this?
Superseded by https://github.com/JuliaStats/GLM.jl/pull/487.