GLM.jl icon indicating copy to clipboard operation
GLM.jl copied to clipboard

nobs() should be number of obs; wobs() should be current nobs

Open iwelch opened this issue 7 years ago • 7 comments

nobs should probably return nrow(m.mf.df), an integer. otherwise, it seems like a misnomer. it is also unexpected to get a Float for standard use(s).

the current nobs should/could probably be named wobs. with weights all equal to 1, it is the same as nobs(), albeit Float.

/iaw

iwelch avatar Sep 27 '18 16:09 iwelch

To compare with Stata

  • reg y x [pw = w] displays the sum of weights, but does not store it in e()
  • svy: reg y x where svyset [pw = w] does indeed store the weights in e(). It uses e(N) for the number of rows and e(N_pop) for the sum of weights.

pdeffebach avatar Sep 27 '18 16:09 pdeffebach

As noted on Discourse:

I’m afraid it’s more complex than that. For example, with frequency/replicate weights, the apparent “number of observations” doesn’t have any meaning, it’s just the way the data has been compressed to save space. So it would be misleading to have nobs return that.

A solution would be to have a keyword argument to request the (unweighted) number of rows.

nalimilan avatar Sep 27 '18 16:09 nalimilan

Would you be open to exporting a function that inspects the model frame in the output for the number of rows in the underlying data set?

However I understand that we want to be agnostic about the input data type.

pdeffebach avatar Sep 27 '18 16:09 pdeffebach

We would need to require a specific layout from all models to do that (https://github.com/JuliaStats/StatsModels.jl/issues/32). Barring that solution, it doesn't seem to hard to require models to implement that simple method.

nalimilan avatar Sep 27 '18 16:09 nalimilan

Thanks for the link. If the officially sanctioned API for all models is still moving, I would like for some sort of unweightedobs() function to be implemented.

However I generally write closures for any regression function, including a custom output struct. So it's not a huge deal if I have to write a function to get the unweighted N.

pdeffebach avatar Sep 27 '18 16:09 pdeffebach

For the sake of completeness:

felm in the R package lfe returns a model where you can do

  • m$M: number of rows in the matrix
  • m$weights: the vector of weights used in the regression.

pdeffebach avatar Sep 27 '18 17:09 pdeffebach