srvyr
srvyr copied to clipboard
Add information about the different degrees of freedom default to the `srvyr` vs `survey` vignette
It appears that the 95% confidence limits from survey_mean()
are slightly different from those obtained with survey::svymean()
- is this a bug?
suppressPackageStartupMessages({
library(survey)
library(srvyr)
})
data(api)
srs_design <- apisrs %>% svydesign(id = ~1, fpc = ~fpc, data = .)
api_mean <- svymean(~enroll, srs_design)
(svy_ci <- confint(api_mean))
#> 2.5 % 97.5 %
#> enroll 530.969 638.251
srs_design_srvyr <- apisrs %>% as_survey_design(ids = 1, fpc = fpc)
(srvyr_ci <-
srs_design_srvyr %>%
summarise(enroll = survey_mean(enroll, vartype = "ci")))
#> enroll enroll_low enroll_upp
#> 1 584.61 530.6408 638.5792
svy_ci[1,1] == srvyr_ci$enroll_low
#> [1] FALSE
svy_ci[1,2] == srvyr_ci$enroll_upp
#> [1] FALSE
svy_ci[1,1] == coef(api_mean) - qnorm(0.975) * SE(api_mean)
#> enroll
#> enroll TRUE
svy_ci[1,2] == coef(api_mean) + qnorm(0.975) * SE(api_mean)
#> enroll
#> enroll TRUE
Created on 2021-05-07 by the reprex package (v2.0.0)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.5 (2021-03-31)
#> os Linux Mint 20
#> system x86_64, linux-gnu
#> ui X11
#> language nl_BE:nl
#> collate nl_BE.UTF-8
#> ctype nl_BE.UTF-8
#> tz Europe/Brussels
#> date 2021-05-07
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> cli 2.4.0 2021-04-05 [1] CRAN (R 4.0.5)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
#> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
#> dplyr 1.0.5 2021-03-05 [1] CRAN (R 4.0.5)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2)
#> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
#> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.3)
#> lattice 0.20-41 2020-04-02 [4] CRAN (R 4.0.0)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
#> Matrix * 1.3-2 2021-01-06 [4] CRAN (R 4.0.3)
#> mitools 2.4 2019-04-26 [1] CRAN (R 4.0.5)
#> pillar 1.5.1 2021-03-05 [1] CRAN (R 4.0.5)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
#> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3)
#> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.4)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.3)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> srvyr * 1.0.1 2021-03-28 [1] CRAN (R 4.0.5)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> survey * 4.0 2020-04-03 [1] CRAN (R 4.0.5)
#> survival * 3.2-10 2021-03-16 [4] CRAN (R 4.0.4)
#> tibble 3.1.0 2021-02-25 [1] CRAN (R 4.0.5)
#> tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.0.5)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5)
#> vctrs 0.3.7 2021-03-29 [1] CRAN (R 4.0.5)
#> withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.3)
#> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.4)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /home/floris/lib/R/library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
Not a bug (though a little confusing, I admit). srvyr tries to be helpful by setting the df
on confint
for you.
If you prefer the results you got using the survey package, you can do this in srvyr:
(srvyr_ci_2 <-
srs_design_srvyr %>%
summarise(enroll = survey_mean(enroll, vartype = "ci", df = Inf)))
#> enroll enroll_low enroll_upp
#> 1 584.61 530.969 638.251
Or to get the srvyr results in survey:
api_mean_2 <- svymean(~enroll, srs_design)
(svy_ci_2 <- confint(api_mean_2, df = degf(srs_design)))
#> 2.5 % 97.5 %
#> enroll 530.6408 638.5792
Thank you for explaining @gergness. And indeed the df
argument is present in the documentation of survey_mean()
:+1:.
I think it would be good to have this aspect added in the srvyr-vs-survey
vignette. The difference can be seen under heading Summary statistics, but it's not discussed.
> svy_ci_2[1,1] == srvyr_ci$enroll_low
[1] TRUE
> svy_ci_2[1,2] == srvyr_ci$enroll_upp
[1] TRUE
>
> srvyr_ci$enroll_low == coef(api_mean) - qt(0.975, df = degf(srs_design)) * SE(api_mean)
enroll
enroll TRUE
> srvyr_ci$enroll_upp == coef(api_mean) + qt(0.975, df = degf(srs_design)) * SE(api_mean)
enroll
enroll TRUE
>
> qt(0.975, df=Inf) == qnorm(0.975)
[1] TRUE
yeah, makes sense