mm() variance clarity

Open leeper opened this issue 5 years ago • 0 comments

Is it sufficiently clear that mm() returns domain estimates rather than SEs based on subsetting the data?

x <- structure(list(level = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,  2L), .Label = c("John", "Kate"), class = "factor"), outcome = c(0L,  0L, 1L, 1L, 0L, 0L, 1L, 1L), weight = c(1L, 1L, 1L, 1L, 0L, 0L,  0L, 0L)), row.names = 1:8, class = "data.frame")

# what people might be expecting
with(subset(x, level == "John"), sqrt(sum((outcome - mean(outcome))^2)/3/4))
svymean(~outcome, svydesign(ids = ~1, weights = ~ 1, data = subset(x, level == "John")))

# what is actually returned (all are equivalent)
## mm()
mm(x, outcome ~ level)

## unweighted data, subset to John
svymean(~outcome, subset(svydesign(ids = ~1, weights = ~ 1, data = x), level == "John"))

## weighted data (Kate weight == 0), subset to John
svymean(~outcome, subset(svydesign(ids = ~1, weights = ~ weight, data = x), level == "John"))

## weighted data (Kate weight == 0), full data frame
svymean(~outcome, svydesign(ids = ~1, weights = ~ weight, data = x))

[ ] Document this better, pointing to vignette: https://cran.r-project.org/web/packages/survey/vignettes/domain.pdf [ ] Add option to not calculate variances as if subsets are random samples of population?

May 03 '20 20:05 leeper