cregg
cregg copied to clipboard
mm() variance clarity
Is it sufficiently clear that mm() returns domain estimates rather than SEs based on subsetting the data?
x <- structure(list(level = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("John", "Kate"), class = "factor"), outcome = c(0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L), weight = c(1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L)), row.names = 1:8, class = "data.frame")
# what people might be expecting
with(subset(x, level == "John"), sqrt(sum((outcome - mean(outcome))^2)/3/4))
svymean(~outcome, svydesign(ids = ~1, weights = ~ 1, data = subset(x, level == "John")))
# what is actually returned (all are equivalent)
## mm()
mm(x, outcome ~ level)
## unweighted data, subset to John
svymean(~outcome, subset(svydesign(ids = ~1, weights = ~ 1, data = x), level == "John"))
## weighted data (Kate weight == 0), subset to John
svymean(~outcome, subset(svydesign(ids = ~1, weights = ~ weight, data = x), level == "John"))
## weighted data (Kate weight == 0), full data frame
svymean(~outcome, svydesign(ids = ~1, weights = ~ weight, data = x))
[ ] Document this better, pointing to vignette: https://cran.r-project.org/web/packages/survey/vignettes/domain.pdf [ ] Add option to not calculate variances as if subsets are random samples of population?