gtsummary
gtsummary copied to clipboard
Do not coerce to factor in `tbl_svysummary()`
@larmarange see this SO post: https://stackoverflow.com/questions/77957551
Is this something we should address? It seems that the survey method for subset()
doesn't remove rows, but puts the weights to 0, and users can't remove unobserved levels from by variables in tbl_svysummary()
.
Because the default (t.test) is not implemented for tbl_svysummary(). You should use "smd", cf. https://www.danieldsjoberg.com/gtsummary/reference/tests.html#tbl-svysummary-add-difference-
Currently, add_difference()
does not change the default tests when applied to a tbl_svysummary()
I was thinking more about the tbl_svysummary()
table itself. The unobserved columns appear in the table, even if we make the underlying column character.
library(gtsummary)
library(PNSIBGE)
pns <- get_pns(year = 2019, labels = TRUE)
pns.2 <- subset(pns, C009 %in% c("Branca", "Preta"))
pns.2$variables$C009 <- as.character(pns.2$variables$C009)
pns.2 |>
gtsummary::tbl_svysummary(by = C009, include = c(C006)) |>
gtsummary::as_kable()
Characteristic | Amarela, N = 0 | Branca, N = 91,037,722 | Ignorado, N = 0 | Indígena, N = 0 | Parda, N = 0 | Preta, N = 21,786,515 |
---|---|---|---|---|---|---|
C006 | ||||||
Homem | 0 (NA%) | 42,682,905 (47%) | 0 (NA%) | 0 (NA%) | 0 (NA%) | 10,691,164 (49%) |
Mulher | 0 (NA%) | 48,354,817 (53%) | 0 (NA%) | 0 (NA%) | 0 (NA%) | 11,095,351 (51%) |
But I just tried to tabulate directly with the survey package, and it still shows all levels, even when the column has previously been converted to a character.
So what they are dealing with is a non-standard situation, and they'd just need to write their own method in add_stat()
for this, and hide the unobserved columns themselves.
Probably because somewhere the levels are still declared. pns.2$variables$C009 <- as.character(pns.2$variables$C009)
did not change metadata stored within the survey object.
It is much safier to use fct_drop() through srvyr::mutate()
But a question remains open: if this is a tbl_svysummary table, should we apply, by default, a relevant test?
Even dropping the levels with srvry, the unobserved levels appear from the survey function.
pns.2 <-
srvyr::as_survey_design(pns) |>
srvyr::filter(C009 %in% c("Branca", "Preta")) |>
srvyr::mutate(C009 = as.character(C009))
survey::svytable(~C009,pns.2)
#> C009
#> Amarela Branca Ignorado Indígena Parda Preta
#> 0 91037722 0 0 0 21786515
But, yes, I better default is warrented!
If I remember, as.character keeps the levels attributes, while forcats::fct_drop() remove unobserved levels.
Same issue with forcats::fct_drop()
unfortunately
HI @larmarange , I am reading through this issue, and I am unclear what the next steps are for this post.
I have added another difference method based on the survey t-test in the new version FYI
But as far as this issue is concerned, we dont remove stratifying levels with zero weights, and making that kind of change (if that is the suggestion?) would require a larger conversation about an approach.
Thanks @ddsjoberg for having added svy.t.test
.
Regarding the second point, it seems that srvyr::fct_drop()
fails in that specific case, but this is maybe a question for the srvyr
package.
If I force a new factor with just two levels, it works.
> pns.2 <-
+ srvyr::as_survey_design(pns) |>
+ srvyr::mutate(test = factor(C009, levels = c("Branca", "Preta"))) |>
+ srvyr::filter(C009 %in% c("Branca", "Preta"))
> survey::svytable(~ test, pns.2)
test
Branca Preta
91037722 21786515
I think it works when it's forced to a factor, because the unspecified levels are coerced to NA, and the "unobserved" levels are lost?
Anyway, it seems that any perceived issue is unrelated to our implementation. I think we can close this one.