gtsummary
gtsummary copied to clipboard
Improve error messaging in `tbl_svysummary()` when there are too few obs in a group to calculate statistic
See example https://stackoverflow.com/questions/74187687
Maybe something related to this part of the code: https://github.com/ddsjoberg/gtsummary/blob/89752920785c55250099134dca36af611fbb9b3f/R/tbl_svysummary.R#L730-L741
We are using oldsvyquantile()
here which cannot compute quantiles in such situation, resulting in NA
.
It seems that the new version svyquantile()
is able to handle it.
Mayben, we could force the use of survey
>= 4.1? One possible issue is that this version may not be available for old versions of R. The big question is therefore how much do we need to be compatible with old version of R? Could be related to the discussion in #1372
load(url("https://github.com/cesarpoggi/PRUEBA/blob/main/PRUEBA_STACKOVERFLOW.rda?raw=true"))
library(survey)
#> Le chargement a nécessité le package : grid
#> Le chargement a nécessité le package : Matrix
#> Le chargement a nécessité le package : survival
#>
#> Attachement du package : 'survey'
#> L'objet suivant est masqué depuis 'package:graphics':
#>
#> dotchart
library(gtsummary)
#Setting the survey object
dessin2<- svydesign(id = ~1,
data = PANxFAM,
weight = ~FACTOR07)
svyquantile(~I601B2, design = subset(dessin2, DOMINIO == "Costa Centro"), quantiles = c(.25, .5, .75))
#> $I601B2
#> quantile ci.2.5 ci.97.5 se
#> 0.25 27.22 NaN NaN NaN
#> 0.5 55.08 NaN NaN NaN
#> 0.75 86.40 NaN NaN NaN
#>
#> attr(,"hasci")
#> [1] TRUE
#> attr(,"class")
#> [1] "newsvyquantile"
svyquantile(~I601B2, design = subset(dessin2, DOMINIO == "Costa Centro"), quantiles = c(.25, .5, .75), ci = F)
#> $I601B2
#> 0.25 0.5 0.75
#> [1,] 27.22 55.08 86.4
#>
#> attr(,"hasci")
#> [1] FALSE
#> attr(,"class")
#> [1] "newsvyquantile"
oldsvyquantile(~I601B2, design = subset(dessin2, DOMINIO == "Costa Centro"), quantiles = c(.25, .5, .75))
#> 0.25 0.5 0.75
#> I601B2 NA NA NA
Created on 2022-10-25 with reprex v2.0.2
Before we restrict the R version, we can simply require a more up-to-date version of the survey package, and delete the legacy code that calculates the old quartiles. What do you think?
It seems good to me
@larmarange i tried to make this update today by simply requiring survey v4.1-1. But I kept getting strange errors.
I have to make another gtsummary release before the dplyr v1.1.0 release in January (ugh!), and was trying to get this into the next release. But it may need to wait unless you have a time to take a look. FYI
I tried updating the code to use the new quantile function. But I am getting incorrect results, and I am not sure why....
library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#>
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#>
#> dotchart
data(api)
# OLD svyquantile
svyby(
# svyby args
formula = ~api00,
by = ~both,
design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
FUN = oldsvyquantile,
# args being passed to svyquantile
na.rm = TRUE,
keep.var = FALSE,
quantiles = 0.5
)
#> both statistic
#> No No 631.0
#> Yes Yes 653.5
oldsvyquantile(
x = ~api00,
design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
quantiles = 0.5
)
#> 0.5
#> api00 652
# NEW svyquantile
svyby(
# svyby args
formula = ~api00,
by = ~both,
design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
FUN = svyquantile,
# args being passed to svyquantile
na.rm = TRUE,
keep.var = FALSE,
quantiles = 0.5
)
#> both statistic.statistic.api00.quantile statistic.statistic.api00.ci.2.5
#> No No 631 547
#> Yes Yes 631 547
#> statistic.statistic.api00.ci.97.5 statistic.statistic.api00.se
#> No 722 39.75493
#> Yes 722 39.75493
#> statistic.statistic.api00.quantile statistic.statistic.api00.ci.2.5
#> No 655 566
#> Yes 655 566
#> statistic.statistic.api00.ci.97.5 statistic.statistic.api00.se
#> No 717 34.94774
#> Yes 717 34.94774
svyquantile(
x = ~api00,
design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
quantiles = 0.5
)
#> $api00
#> quantile ci.2.5 ci.97.5 se
#> 0.5 652 561 714 35.66788
#>
#> attr(,"hasci")
#> [1] TRUE
#> attr(,"class")
#> [1] "newsvyquantile"
Created on 2022-12-25 with reprex v2.0.2
@larmarange FYI I posted to SO about this here https://stackoverflow.com/questions/74916773
Hi
I do not think that the keep.var
arg should be used with the new version. See:
library(survey)
#> Le chargement a nécessité le package : grid
#> Le chargement a nécessité le package : Matrix
#> Le chargement a nécessité le package : survival
#>
#> Attachement du package : 'survey'
#> L'objet suivant est masqué depuis 'package:graphics':
#>
#> dotchart
data(api)
d <- svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)
# old version
svyby(
# svyby args
formula = ~api00,
by = ~both,
design = d,
FUN = oldsvyquantile,
# args being passed to oldsvyquantile
na.rm = TRUE,
keep.var = FALSE,
quantiles = 0.5
)
#> both statistic
#> No No 631.0
#> Yes Yes 653.5
# new version
svyby(
# svyby args
formula = ~api00,
by = ~both,
design = d,
FUN = svyquantile,
# args being passed to svyquantile
na.rm = TRUE,
quantiles = 0.5
)
#> both api00 se.api00
#> No No 631 39.75493
#> Yes Yes 655 34.94774
Created on 2022-12-26 with reprex v2.0.2
Results are different as the new default rule for computing the quantiles is not the same.
Ah, ok. The survey pkg author did note the first error as a bug and submitted a fix to their base code. I wonder if there will be any changes to the format of the output with the change (not quite ready to be downloaded, it seems, based on his comments).
I am thinking this change should perhaps wait until survey makes its next release, so we don't risk breaking tbl_svysummary()
with the next release of survey.
Sure. No emergency here
HI @larmarange , I've updated the the backend to use the "new" svyquantile()
, and the bug has been addressed in the survey package. i've also bumped the min version requirement of survey to v4.2
Great thanks