tern icon indicating copy to clipboard operation
tern copied to clipboard

[Feature Request]: different factor levels for different split contexts

Open clarkliming opened this issue 2 years ago • 7 comments

Feature description

similar to https://github.com/insightsengineering/chevron/pull/343 where different precisions are needed, we have similar cases that, when splitting by "parameters", the levels need to be adjusted.

library(dplyr)

data <- expand.grid(
  USUBJID = c("1", "2", "3"),
  ARM = c("arm a", "arm b"),
  PARAMCD = c("test a", "test b", "test c"),
  AVISIT = c("VISIT 1", "VISIT 2", "VISIT 3")
) %>%
mutate(
  AVALCAT = case_when(
    PARAMCD == "test a" ~ sample(c("high", "low"), size = n(), replace = TRUE),
    PARAMCD == "test b" ~ sample(c("normal", "abnormal"), size = n(), replace = TRUE),
    PARAMCD == "test c" ~ sample(c("high", "middle", "low"), size = n(), replace = TRUE),
    TRUE ~ sample(c("high", "low"), size = n(), replace = TRUE),
  )
) %>%
df_explicit_na

basic_table() %>%
  split_cols_by("ARM") %>%
  split_rows_by("PARAMCD") %>%
  split_rows_by("AVISIT") %>%
  summarize_vars("AVALCAT") %>%
  build_table(data)

even if we there are no actual categories in that parameter, the levels are there; pruning is removing all zeros so maybe not optimal if we want to keep some 0 count rows. Best if it could be achieved in analysis part.

Looping @barnett11 for your comments. This arise up when I review a template of egt05_qtcat_1 where Tim asks if by parameter analysis is possible

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct.

Contribution Guidelines

  • [X] I agree to follow this project's Contribution Guidelines.

Security Policy

  • [X] I agree to follow this project's Security Policy.

clarkliming avatar Feb 09 '23 15:02 clarkliming

Hi @clarkliming - yes this will definitely be needed once we get into further TA specific templates. Some examples include SSRST01, PROT03 and SLAMPT01

barnett11 avatar Feb 10 '23 08:02 barnett11

then @khatril please consider this in the next increment

clarkliming avatar Feb 10 '23 08:02 clarkliming

I have the feeling that this is very template specific. The solution would be a switcher (more or less complicated, depending on template) inside the analysis function based on the split_context. At this stage, also selected empty levels can be dropped with NULL. Eventually, they can be selectively trimmed afterward.

Melkiades avatar Feb 10 '23 08:02 Melkiades

will trim_levels_in_group or trim_levels_to_map work here?

clarkliming avatar Feb 10 '23 09:02 clarkliming

I would personally focus on the analysis function as @barnett11 mentioned that it is also important to have a specific order of the output which may change for different split contexts. Surely trimming beforehand could simplify the analysis function

Melkiades avatar Feb 10 '23 10:02 Melkiades

#trim_levels_to_map is specifically designed to do exactly this thing (including leaving desired 0s in while removing undesired level combinations, even when they are not unobserved)

The map also controls the ordering of the levels:

data <- data.frame(LBCAT = c(rep("a", 4), rep("b", 4)),
                   PARAM = c(rep("param1", 4), rep("param2", 4)),
                   VISIT = rep(c("V1", "V2"), 4),
                   ABN = rep(c("H", "L"), 4),
                   stringsAsFactors = TRUE)

map <- data.frame(LBCAT = c(rep("a", 4), rep("b", 4)),
                  PARAM = c(rep("param1", 4), rep("param2", 4)),
                  VISIT = rep(c("V1", "V1", "V2", "V2"), 2),
                  ABN = rep(c("H", "L"), 4),
                  stringsAsFactors = FALSE)



tbl <- basic_table() %>%
  split_rows_by("LBCAT", split_fun = trim_levels_to_map(map = map)) %>%
  split_rows_by("PARAM", split_fun = trim_levels_to_map(map = map)) %>%
  split_rows_by("VISIT", split_fun = trim_levels_to_map(map = map)) %>%
  analyze("ABN") %>%
    build_table(df = data)
> map
  LBCAT  PARAM VISIT ABN
1     a param1    V1   H
2     a param1    V1   L
3     a param1    V2   H
4     a param1    V2   L
5     b param2    V1   H
6     b param2    V1   L
7     b param2    V2   H
8     b param2    V2   L
> tbl
           all obs
——————————————————
a                 
  param1          
    V1            
      H       2   
      L       0   
    V2            
      H       0   
      L       2   
b                 
  param2          
    V1            
      H       2   
      L       0   
    V2            
      H       0   
      L       2   

Vs


map2 <- data.frame(LBCAT = c(rep("a", 4), rep("b", 4)),
                  PARAM = c(rep("param1", 4), rep("param2", 4)),
                  VISIT = rep(c("V2", "V2", "V1", "V1"), 2),
                  ABN = rep(c("H", "L", "L", "H"), 2),
                  stringsAsFactors = FALSE)

tbl2 <- basic_table() %>%
  split_rows_by("LBCAT", split_fun = trim_levels_to_map(map = map2)) %>%
  split_rows_by("PARAM", split_fun = trim_levels_to_map(map = map2)) %>%
  split_rows_by("VISIT", split_fun = trim_levels_to_map(map = map2)) %>%
  analyze("ABN") %>%
    build_table(df = data)
> map2
  LBCAT  PARAM VISIT ABN
1     a param1    V2   H
2     a param1    V2   L
3     a param1    V1   L
4     a param1    V1   H
5     b param2    V2   H
6     b param2    V2   L
7     b param2    V1   L
8     b param2    V1   H
> tbl2
           all obs
——————————————————
a                 
  param1          
    V2            
      H       0   
      L       2   
    V1            
      L       0   
      H       2   
b                 
  param2          
    V2            
      H       0   
      L       2   
    V1            
      L       0   
      H       2   

Note the different orders of H, and L for the V1 and V2 parent facets. I believe this solves the stated issue out of the box. Am I missing something? @barnett11 @Melkiades

gmbecker avatar Feb 17 '23 21:02 gmbecker

I think this is a viable option but it could be a bit cumbersome for longer tables. In my opinion, the best solution is to wrap prune_empty_level in a way that keeps selected lines. The problem is how I feed the selected lines (aka path) when the pruning function does not see the whole row path of the element that is analyzed. In other words, if I get to the DataRow element I can know its label but not the whole path that could be the one "to keep". I guess my question for @gmbecker is if there is a way to see always the row path in each pruning iteration of tt. Otherwise, is there a way to do what I intend here?

This is my tentative solution:

library(dplyr)
library(tern)
library(rtables)
data <- expand.grid(
    USUBJID = c("1", "2", "3"),
    ARM = c("arm a", "arm b"),
    PARAMCD = c("test a", "test b", "test c"),
    AVISIT = c("VISIT 1", "VISIT 2", "VISIT 3")
) %>%
    mutate(
        AVALCAT = case_when(
            PARAMCD == "test a" ~ sample(c("high", "low"), size = n(), replace = TRUE),
            PARAMCD == "test b" ~ sample(c("normal", "abnormal"), size = n(), replace = TRUE),
            PARAMCD == "test c" ~ sample(c("high", "middle", "low"), size = n(), replace = TRUE),
            TRUE ~ sample(c("high", "low"), size = n(), replace = TRUE),
        )
    ) %>%
    df_explicit_na
data


mysplitfun <- make_split_fun(pre = list(drop_facet_levels),
                             post = list(add_overall_facet("ALL", "All Arms")))

a_tbl <- basic_table() %>%
    split_cols_by("ARM") %>%
    split_rows_by("PARAMCD") %>%
    split_rows_by("AVISIT") %>%
    summarize_vars("AVALCAT") %>%
    build_table(data)
a_tbl

my_personal_pruning <- function(to_keep) {
    function (tt) {
        if(is(tt, "DataRow")) 
            browser()
        print(row_paths(tt)[1])
        if(row_paths(tt)[1] %in% to_keep)
            return(FALSE)
        rtables::prune_empty_level(tt)
    }
}
prune_table(a_tbl, prune_func = my_personal_pruning(list(c("PARAMCD", "test a"))))

Melkiades avatar Mar 01 '23 10:03 Melkiades