tern
tern copied to clipboard
[Bug]: add_rowcounts doesn't work if layout begins with non-population variable (eg AVISIT)
What happened?
In the layout we first split by AVISIT (non-population dataset variable) and then ARM. When trying to add the row counts (N=xx) form alt_counts_df = ADSL
there is an error because ADSL does not include AVISIT.
Not very nice workaround is to add a dummy AVISIT to ADSL and repeat the dataset as a many times as there are levels in AVISIT.
Noticed this when working on PKPT03.
library(rtables)
library(tern)
library(scda)
library(dplyr)
adsl <- synthetic_cdisc_dataset("latest", "adsl")
advs <- synthetic_cdisc_dataset("latest", "advs") %>%
filter(AVISITN %in% c(0, 1)) %>%
filter(PARAMCD %in% c("SYSBP", "DIABP"))
lyt <- basic_table() %>%
split_cols_by_multivar(c("AVAL", "CHG")) %>%
split_rows_by("AVISIT", split_fun = drop_split_levels) %>%
split_rows_by("ARM", split_fun = drop_split_levels) %>%
add_rowcounts(alt_counts = TRUE) %>%
split_rows_by("PARAM", split_fun = drop_split_levels) %>%
analyze_colvars(afun = mean, format = "xx.x")
# Error
build_table(lyt, advs, alt_counts_df = adsl)
Error
Error: Following error encountered in splitting alt_counts_df: variable(s) [AVISIT] not present in data. (VarLevelSplit)
# not pretty workaround: dummy adsl with visit
adsl_visit <- rbind(adsl, adsl) %>%
select(ARM) %>%
mutate(
AVISIT = factor(rep.int(c("BASELINE", "WEEK 1 DAY 8"), c(nrow(adsl), nrow(adsl))))
)
build_table(lyt, advs, alt_counts_df = adsl_visit)
Desired output
BASELINE
A: Drug X (N=134)
Diastolic Blood Pressure
mean 96.5 0.0
Systolic Blood Pressure
mean 151.7 0.0
B: Placebo (N=134)
Diastolic Blood Pressure
mean 101.1 0.0
Systolic Blood Pressure
mean 149.5 0.0
C: Combination (N=132)
Diastolic Blood Pressure
mean 102.8 0.0
Systolic Blood Pressure
mean 144.7 0.0
WEEK 1 DAY 8
A: Drug X (N=134)
Diastolic Blood Pressure
mean 100.6 4.1
...
Example 2: With trim_levels_in_group split function
Set to NA a specific combination of the split vars - we want to keep this displayed in the table as missing
advs_miss <- advs %>%
mutate(
AVAL = if_else(
AVISIT == "BASELINE" & ARM == "A: Drug X" & PARAMCD == "DIABP",
NA, AVAL),
CHG = if_else(
AVISIT == "BASELINE" & ARM == "A: Drug X" & PARAMCD == "DIABP",
NA, CHG)
)
lyt_trim <- basic_table() %>%
split_cols_by_multivar(c("AVAL", "CHG")) %>%
split_rows_by("AVISIT", split_fun = drop_split_levels) %>%
split_rows_by("ARM", split_fun = trim_levels_in_group("PARAMCD")) %>% ## change split fun here <------
add_rowcounts(alt_counts = TRUE) %>%
split_rows_by("PARAMCD") %>%
analyze_colvars(afun = mean, format = "xx.x")
build_table(lyt_trim, advs_miss, alt_counts_df = adsl_visit)
GIves an error because PARAMCD is not it alt_counts_df:
Error: Following error encountered in splitting alt_counts_df: Error applying custom split function: no applicable method for 'droplevels' applied to an object of class "NULL"
split: VarLevelSplit (ARM)
occured at path: AVISIT[BASELINE]
Now we add PARAMCD to alt_counts_df to avoid the error:
adsl_avisit_param <- adsl_visit %>%
mutate(PARAMCD = factor(NA_character_, levels = levels(advs$PARAMCD)))
build_table(lyt_trim, advs_miss, alt_counts_df = adsl_avisit_param)
And get the desired table:
AVAL CHG
———————————————————————————————————————
BASELINE
A: Drug X (N=134)
DIABP
mean NA NA
SYSBP
mean 151.7 0.0
B: Placebo (N=134)
DIABP
mean 101.1 0.0
SYSBP
mean 149.5 0.0
...
@Melkiades I added a second example to the issue.
tl;dr : alt_counts_df (ADSL) needs to be pre-processed to include potentially all variables from the row splits in the layout (depending on type of split functions used).
From rtables perspective this makes sense, it's just not very user friendly.
Related to https://github.com/insightsengineering/tern/issues/535