GLM.jl icon indicating copy to clipboard operation
GLM.jl copied to clipboard

Draft of fweights, pweights, aweights for glms

Open jeffwong opened this issue 7 years ago • 5 comments

This is a code outline for https://github.com/JuliaStats/GLM.jl/issues/186. I am new to contributing to Julia, so please feel free to leave feedback on how this feature should be implemented

jeffwong avatar Aug 13 '17 17:08 jeffwong

I think if we only supported Frequency Weights and Probability Weights this PR would be a lot easier for me. I am completely unfamiliar with Analytic Weights, and how they affect formulas for covariance of the glm parameters. It would also mean that we could easily simplify the API by only passing in 1 vector of weights. If someone needed to combine Frequency and Probability together, they could do that using fweight * pweight / sum(fweight * pweight) * sum(fweight). This could then be cast as a Frequency Weight: it has the relative weighting that the pweight brings, and the sum of this new vector still sums to sum(fweight), protecting the use of nobs in the package

jeffwong avatar Sep 08 '17 03:09 jeffwong

Retrospectively, by limiting the scope to only Fweights and Pweights, I think we only needed a way to define nobs as a function of the weight type. Even if someone wanted to combine Fweights and Pweights, it would ultimately still behave like an Fweight. As long as the cov(parameters) was already based on $X' wts X$ instead of $X'X$ the only modification we would need is to nobs

jeffwong avatar Sep 08 '17 03:09 jeffwong

Sure, let's go with the simpler version for now. I don't know of other software accepting multiple types of weights at the same time, and we can always add support for analytic weights later.

nalimilan avatar Sep 08 '17 07:09 nalimilan

@jeffwong about your comment on Analytic Weights:

They represent multiple observations, like fweights, but instead of representing multiple identical obs., they represent distinct observations that were averaged out. This happens with variables such as "average grade by classroom", "accident rate per state", etc. Since groups with more individuals have less noisy means, aweights lead to heteroskedasticity.

That said, on linear models pweights are just "aweights with robust standard errors", so if the code supports pweights then aweights come for free.

sergiocorreia avatar Sep 14 '17 12:09 sergiocorreia

What's the status on this?

Nosferican avatar Apr 29 '18 05:04 Nosferican

Superseded by https://github.com/JuliaStats/GLM.jl/pull/487.

nalimilan avatar Sep 03 '22 19:09 nalimilan