srvyr icon indicating copy to clipboard operation
srvyr copied to clipboard

Add information about the different degrees of freedom default to the `srvyr` vs `survey` vignette

Open florisvdh opened this issue 3 years ago • 3 comments

It appears that the 95% confidence limits from survey_mean() are slightly different from those obtained with survey::svymean() - is this a bug?

suppressPackageStartupMessages({
    library(survey)
    library(srvyr)
})
data(api)

srs_design <- apisrs %>% svydesign(id = ~1, fpc = ~fpc, data = .)
api_mean <- svymean(~enroll, srs_design)
(svy_ci <- confint(api_mean))
#>          2.5 %  97.5 %
#> enroll 530.969 638.251

srs_design_srvyr <- apisrs %>% as_survey_design(ids = 1, fpc = fpc)
(srvyr_ci <- 
    srs_design_srvyr %>% 
    summarise(enroll = survey_mean(enroll, vartype = "ci")))
#>   enroll enroll_low enroll_upp
#> 1 584.61   530.6408   638.5792

svy_ci[1,1] == srvyr_ci$enroll_low
#> [1] FALSE
svy_ci[1,2] == srvyr_ci$enroll_upp
#> [1] FALSE

svy_ci[1,1] == coef(api_mean) - qnorm(0.975) * SE(api_mean)
#>        enroll
#> enroll   TRUE
svy_ci[1,2] == coef(api_mean) + qnorm(0.975) * SE(api_mean)
#>        enroll
#> enroll   TRUE

Created on 2021-05-07 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.5 (2021-03-31)
#>  os       Linux Mint 20               
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language nl_BE:nl                    
#>  collate  nl_BE.UTF-8                 
#>  ctype    nl_BE.UTF-8                 
#>  tz       Europe/Brussels             
#>  date     2021-05-07                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.2)
#>  cli           2.4.0   2021-04-05 [1] CRAN (R 4.0.5)
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.3)
#>  DBI           1.1.1   2021-01-15 [1] CRAN (R 4.0.3)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.3)
#>  dplyr         1.0.5   2021-03-05 [1] CRAN (R 4.0.5)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.2)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.2)
#>  fansi         0.4.2   2021-01-15 [1] CRAN (R 4.0.3)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
#>  generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.3)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.2)
#>  htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
#>  knitr         1.31    2021-01-27 [1] CRAN (R 4.0.3)
#>  lattice       0.20-41 2020-04-02 [4] CRAN (R 4.0.0)
#>  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.4)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.3)
#>  Matrix      * 1.3-2   2021-01-06 [4] CRAN (R 4.0.3)
#>  mitools       2.4     2019-04-26 [1] CRAN (R 4.0.5)
#>  pillar        1.5.1   2021-03-05 [1] CRAN (R 4.0.5)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.2)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.2)
#>  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.3)
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.0.5)
#>  rlang         0.4.10  2020-12-30 [1] CRAN (R 4.0.3)
#>  rmarkdown     2.7     2021-02-19 [1] CRAN (R 4.0.4)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.3)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.2)
#>  srvyr       * 1.0.1   2021-03-28 [1] CRAN (R 4.0.5)
#>  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.2)
#>  survey      * 4.0     2020-04-03 [1] CRAN (R 4.0.5)
#>  survival    * 3.2-10  2021-03-16 [4] CRAN (R 4.0.4)
#>  tibble        3.1.0   2021-02-25 [1] CRAN (R 4.0.5)
#>  tidyr         1.1.3   2021-03-03 [1] CRAN (R 4.0.5)
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.2)
#>  utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.5)
#>  vctrs         0.3.7   2021-03-29 [1] CRAN (R 4.0.5)
#>  withr         2.4.1   2021-01-26 [1] CRAN (R 4.0.3)
#>  xfun          0.22    2021-03-11 [1] CRAN (R 4.0.4)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.2)
#> 
#> [1] /home/floris/lib/R/library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

florisvdh avatar May 07 '21 13:05 florisvdh

Not a bug (though a little confusing, I admit). srvyr tries to be helpful by setting the df on confint for you.

If you prefer the results you got using the survey package, you can do this in srvyr:

(srvyr_ci_2 <- 
        srs_design_srvyr %>% 
        summarise(enroll = survey_mean(enroll, vartype = "ci", df = Inf)))
#>   enroll enroll_low enroll_upp
#> 1 584.61    530.969    638.251

Or to get the srvyr results in survey:

api_mean_2 <- svymean(~enroll, srs_design)
(svy_ci_2 <- confint(api_mean_2, df = degf(srs_design)))
#>           2.5 %   97.5 %
#> enroll 530.6408 638.5792

gergness avatar May 07 '21 21:05 gergness

Thank you for explaining @gergness. And indeed the df argument is present in the documentation of survey_mean() :+1:.

I think it would be good to have this aspect added in the srvyr-vs-survey vignette. The difference can be seen under heading Summary statistics, but it's not discussed.

> svy_ci_2[1,1] == srvyr_ci$enroll_low
[1] TRUE
> svy_ci_2[1,2] == srvyr_ci$enroll_upp
[1] TRUE
> 
> srvyr_ci$enroll_low == coef(api_mean) - qt(0.975, df = degf(srs_design)) * SE(api_mean)
       enroll
enroll   TRUE
> srvyr_ci$enroll_upp == coef(api_mean) + qt(0.975, df = degf(srs_design)) * SE(api_mean)
       enroll
enroll   TRUE
> 
> qt(0.975, df=Inf) == qnorm(0.975)
[1] TRUE

florisvdh avatar May 10 '21 07:05 florisvdh

yeah, makes sense

gergness avatar May 10 '21 20:05 gergness