gtsummary Do not coerce to factor in `tbl

@larmarange see this SO post: https://stackoverflow.com/questions/77957551

Is this something we should address? It seems that the survey method for subset() doesn't remove rows, but puts the weights to 0, and users can't remove unobserved levels from by variables in tbl_svysummary().

Feb 09 '24 16:02 ddsjoberg

Because the default (t.test) is not implemented for tbl_svysummary(). You should use "smd", cf. https://www.danieldsjoberg.com/gtsummary/reference/tests.html#tbl-svysummary-add-difference-

Currently, add_difference() does not change the default tests when applied to a tbl_svysummary()

Feb 09 '24 17:02 larmarange

I was thinking more about the tbl_svysummary() table itself. The unobserved columns appear in the table, even if we make the underlying column character.

library(gtsummary)
library(PNSIBGE)

pns <- get_pns(year = 2019, labels = TRUE)
pns.2 <- subset(pns, C009  %in% c("Branca", "Preta")) 
pns.2$variables$C009 <- as.character(pns.2$variables$C009)

pns.2 |> 
  gtsummary::tbl_svysummary(by = C009, include = c(C006)) |> 
  gtsummary::as_kable()

Characteristic	Amarela, N = 0	Branca, N = 91,037,722	Ignorado, N = 0	Indígena, N = 0	Parda, N = 0	Preta, N = 21,786,515
C006
Homem	0 (NA%)	42,682,905 (47%)	0 (NA%)	0 (NA%)	0 (NA%)	10,691,164 (49%)
Mulher	0 (NA%)	48,354,817 (53%)	0 (NA%)	0 (NA%)	0 (NA%)	11,095,351 (51%)

But I just tried to tabulate directly with the survey package, and it still shows all levels, even when the column has previously been converted to a character.

So what they are dealing with is a non-standard situation, and they'd just need to write their own method in add_stat() for this, and hide the unobserved columns themselves.

Feb 09 '24 18:02 ddsjoberg

Probably because somewhere the levels are still declared. pns.2$variables$C009 <- as.character(pns.2$variables$C009) did not change metadata stored within the survey object.

It is much safier to use fct_drop() through srvyr::mutate()

Feb 09 '24 18:02 larmarange

But a question remains open: if this is a tbl_svysummary table, should we apply, by default, a relevant test?

Feb 09 '24 18:02 larmarange

Even dropping the levels with srvry, the unobserved levels appear from the survey function.

pns.2 <- 
  srvyr::as_survey_design(pns) |> 
  srvyr::filter(C009 %in% c("Branca", "Preta")) |> 
  srvyr::mutate(C009 = as.character(C009))

survey::svytable(~C009,pns.2)
#> C009
#>  Amarela   Branca Ignorado Indígena    Parda    Preta 
#>        0 91037722        0        0        0 21786515

Feb 09 '24 18:02 ddsjoberg

But, yes, I better default is warrented!

Feb 09 '24 18:02 ddsjoberg

If I remember, as.character keeps the levels attributes, while forcats::fct_drop() remove unobserved levels.

Feb 09 '24 18:02 larmarange

Same issue with forcats::fct_drop() unfortunately

Feb 09 '24 19:02 ddsjoberg

HI @larmarange , I am reading through this issue, and I am unclear what the next steps are for this post.

I have added another difference method based on the survey t-test in the new version FYI

But as far as this issue is concerned, we dont remove stratifying levels with zero weights, and making that kind of change (if that is the suggestion?) would require a larger conversation about an approach.

Jul 01 '24 23:07 ddsjoberg

Thanks @ddsjoberg for having added svy.t.test.

Regarding the second point, it seems that srvyr::fct_drop() fails in that specific case, but this is maybe a question for the srvyr package.

If I force a new factor with just two levels, it works.

> pns.2 <- 
+   srvyr::as_survey_design(pns) |> 
+   srvyr::mutate(test = factor(C009, levels = c("Branca", "Preta"))) |> 
+   srvyr::filter(C009 %in% c("Branca", "Preta"))
> survey::svytable(~ test, pns.2)
test
  Branca    Preta 
91037722 21786515

Jul 03 '24 19:07 larmarange

I think it works when it's forced to a factor, because the unspecified levels are coerced to NA, and the "unobserved" levels are lost?

Jul 03 '24 19:07 ddsjoberg

Anyway, it seems that any perceived issue is unrelated to our implementation. I think we can close this one.

Jul 03 '24 19:07 ddsjoberg

gtsummary
gtsummary copied to clipboard

Do not coerce to factor in `tbl_svysummary()`

gtsummary gtsummary copied to clipboard

Do not coerce to factor in `tbl_svysummary()`

gtsummary
gtsummary copied to clipboard