gtsummary icon indicating copy to clipboard operation
gtsummary copied to clipboard

Improve error messaging in `tbl_svysummary()` when there are too few obs in a group to calculate statistic

Open ddsjoberg opened this issue 2 years ago • 10 comments

See example https://stackoverflow.com/questions/74187687

ddsjoberg avatar Oct 25 '22 00:10 ddsjoberg

Maybe something related to this part of the code: https://github.com/ddsjoberg/gtsummary/blob/89752920785c55250099134dca36af611fbb9b3f/R/tbl_svysummary.R#L730-L741

We are using oldsvyquantile() here which cannot compute quantiles in such situation, resulting in NA. It seems that the new version svyquantile() is able to handle it. Mayben, we could force the use of survey >= 4.1? One possible issue is that this version may not be available for old versions of R. The big question is therefore how much do we need to be compatible with old version of R? Could be related to the discussion in #1372

load(url("https://github.com/cesarpoggi/PRUEBA/blob/main/PRUEBA_STACKOVERFLOW.rda?raw=true"))
library(survey)
#> Le chargement a nécessité le package : grid
#> Le chargement a nécessité le package : Matrix
#> Le chargement a nécessité le package : survival
#> 
#> Attachement du package : 'survey'
#> L'objet suivant est masqué depuis 'package:graphics':
#> 
#>     dotchart
library(gtsummary)

#Setting the survey object
dessin2<- svydesign(id = ~1, 
                    data = PANxFAM,
                    weight = ~FACTOR07)

svyquantile(~I601B2, design = subset(dessin2, DOMINIO == "Costa Centro"), quantiles = c(.25, .5, .75))
#> $I601B2
#>      quantile ci.2.5 ci.97.5  se
#> 0.25    27.22    NaN     NaN NaN
#> 0.5     55.08    NaN     NaN NaN
#> 0.75    86.40    NaN     NaN NaN
#> 
#> attr(,"hasci")
#> [1] TRUE
#> attr(,"class")
#> [1] "newsvyquantile"

svyquantile(~I601B2, design = subset(dessin2, DOMINIO == "Costa Centro"), quantiles = c(.25, .5, .75), ci = F)
#> $I601B2
#>       0.25   0.5 0.75
#> [1,] 27.22 55.08 86.4
#> 
#> attr(,"hasci")
#> [1] FALSE
#> attr(,"class")
#> [1] "newsvyquantile"

oldsvyquantile(~I601B2, design = subset(dessin2, DOMINIO == "Costa Centro"), quantiles = c(.25, .5, .75))
#>        0.25 0.5 0.75
#> I601B2   NA  NA   NA

Created on 2022-10-25 with reprex v2.0.2

larmarange avatar Oct 25 '22 17:10 larmarange

Before we restrict the R version, we can simply require a more up-to-date version of the survey package, and delete the legacy code that calculates the old quartiles. What do you think?

ddsjoberg avatar Oct 26 '22 23:10 ddsjoberg

It seems good to me

larmarange avatar Oct 27 '22 10:10 larmarange

@larmarange i tried to make this update today by simply requiring survey v4.1-1. But I kept getting strange errors.

I have to make another gtsummary release before the dplyr v1.1.0 release in January (ugh!), and was trying to get this into the next release. But it may need to wait unless you have a time to take a look. FYI

ddsjoberg avatar Dec 24 '22 23:12 ddsjoberg

I tried updating the code to use the new quantile function. But I am getting incorrect results, and I am not sure why....

library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#> 
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#> 
#>     dotchart
data(api)

# OLD svyquantile
svyby(
  # svyby args
  formula = ~api00,
  by = ~both,
  design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
  FUN = oldsvyquantile,
  # args being passed to svyquantile
  na.rm = TRUE,
  keep.var = FALSE,
  quantiles = 0.5
)
#>     both statistic
#> No    No     631.0
#> Yes  Yes     653.5
oldsvyquantile(
  x = ~api00,
  design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
  quantiles = 0.5
)
#>       0.5
#> api00 652

# NEW svyquantile
svyby(
  # svyby args
  formula = ~api00,
  by = ~both,
  design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
  FUN = svyquantile,
  # args being passed to svyquantile
  na.rm = TRUE,
  keep.var = FALSE,
  quantiles = 0.5
)
#>     both statistic.statistic.api00.quantile statistic.statistic.api00.ci.2.5
#> No    No                                631                              547
#> Yes  Yes                                631                              547
#>     statistic.statistic.api00.ci.97.5 statistic.statistic.api00.se
#> No                                722                     39.75493
#> Yes                               722                     39.75493
#>     statistic.statistic.api00.quantile statistic.statistic.api00.ci.2.5
#> No                                 655                              566
#> Yes                                655                              566
#>     statistic.statistic.api00.ci.97.5 statistic.statistic.api00.se
#> No                                717                     34.94774
#> Yes                               717                     34.94774
svyquantile(
  x = ~api00,
  design = svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc),
  quantiles = 0.5
)
#> $api00
#>     quantile ci.2.5 ci.97.5       se
#> 0.5      652    561     714 35.66788
#> 
#> attr(,"hasci")
#> [1] TRUE
#> attr(,"class")
#> [1] "newsvyquantile"

Created on 2022-12-25 with reprex v2.0.2

ddsjoberg avatar Dec 25 '22 23:12 ddsjoberg

@larmarange FYI I posted to SO about this here https://stackoverflow.com/questions/74916773

ddsjoberg avatar Dec 26 '22 02:12 ddsjoberg

Hi

I do not think that the keep.var arg should be used with the new version. See:

library(survey)
#> Le chargement a nécessité le package : grid
#> Le chargement a nécessité le package : Matrix
#> Le chargement a nécessité le package : survival
#> 
#> Attachement du package : 'survey'
#> L'objet suivant est masqué depuis 'package:graphics':
#> 
#>     dotchart

data(api)

d <- svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)

# old version
svyby(
  # svyby args
  formula = ~api00,
  by = ~both,
  design = d,
  FUN = oldsvyquantile,
  # args being passed to oldsvyquantile
  na.rm = TRUE,
  keep.var = FALSE,
  quantiles = 0.5
)
#>     both statistic
#> No    No     631.0
#> Yes  Yes     653.5

# new version
svyby(
  # svyby args
  formula = ~api00,
  by = ~both,
  design = d,
  FUN = svyquantile,
  # args being passed to svyquantile
  na.rm = TRUE,
  quantiles = 0.5
)
#>     both api00 se.api00
#> No    No   631 39.75493
#> Yes  Yes   655 34.94774

Created on 2022-12-26 with reprex v2.0.2

larmarange avatar Dec 26 '22 10:12 larmarange

Results are different as the new default rule for computing the quantiles is not the same.

larmarange avatar Dec 26 '22 10:12 larmarange

Ah, ok. The survey pkg author did note the first error as a bug and submitted a fix to their base code. I wonder if there will be any changes to the format of the output with the change (not quite ready to be downloaded, it seems, based on his comments).

I am thinking this change should perhaps wait until survey makes its next release, so we don't risk breaking tbl_svysummary() with the next release of survey.

ddsjoberg avatar Dec 26 '22 14:12 ddsjoberg

Sure. No emergency here

larmarange avatar Dec 26 '22 15:12 larmarange

HI @larmarange , I've updated the the backend to use the "new" svyquantile(), and the bug has been addressed in the survey package. i've also bumped the min version requirement of survey to v4.2

ddsjoberg avatar Jul 01 '24 23:07 ddsjoberg

Great thanks

larmarange avatar Jul 03 '24 19:07 larmarange