rtables icon indicating copy to clipboard operation
rtables copied to clipboard

[Question]: reconsider the levels of analyze, summarize and split_labels

Open clarkliming opened this issue 1 year ago • 10 comments

What is your question?

the indention of contents is a complicated thing I agree. But I do think we need to reconsider this if we have the bandwidth. Here is an example: ae table is needed

basic_table() %>%
  split_cols_by("ACTARM") %>%
  split_rows_by("AEBODSYS", child_labels = "visible") %>%
  summarize_num_patients("USUBJID",
      .stats = c("unique", "nonunique"),
      .labels = c("Total number of patients with at least one adverse event", "Total number of events")) %>%
  count_occurrences("AEDECOD") %>%
  build_table(ex_adae, alt_counts_df = ex_adsl)

we get

                                                             A: Drug X    B: Placebo   C: Combination
—————————————————————————————————————————————————————————————————————————————————————————————————————
cl A.1                                                                                               
  Total number of patients with at least one adverse event   78 (58.2%)   75 (56.0%)     89 (67.4%)  
  Total number of events                                        132          130            160      
    dcd A.1.1.1.1                                            50 (37.3%)   45 (33.6%)     63 (47.7%)  
    dcd A.1.1.1.2                                            48 (35.8%)   48 (35.8%)     50 (37.9%)  
cl B.1                                                                                               
  Total number of patients with at least one adverse event   47 (35.1%)   49 (36.6%)     43 (32.6%)  
  Total number of events                                         56           60             62      
    dcd B.1.1.1.1                                            47 (35.1%)   49 (36.6%)     43 (32.6%)  
cl B.2                                                                                               
  Total number of patients with at least one adverse event   79 (59.0%)   74 (55.2%)     85 (64.4%)  
  Total number of events                                        129          138            143      
    dcd B.2.1.2.1                                            49 (36.6%)   44 (32.8%)     52 (39.4%)  
    dcd B.2.2.3.1                                            48 (35.8%)   54 (40.3%)     51 (38.6%)  
cl C.1                                                                                               
  Total number of patients with at least one adverse event   43 (32.1%)   46 (34.3%)     43 (32.6%)  
  Total number of events                                         55           63             64      
    dcd C.1.1.1.3                                            43 (32.1%)   46 (34.3%)     43 (32.6%)  
cl C.2                                                                                               
  Total number of patients with at least one adverse event   35 (26.1%)   48 (35.8%)     55 (41.7%)  
  Total number of events                                         48           53             65      
    dcd C.2.1.2.1                                            35 (26.1%)   48 (35.8%)     55 (41.7%)  
cl D.1                                                                                               
  Total number of patients with at least one adverse event   79 (59.0%)   67 (50.0%)     80 (60.6%)  
  Total number of events                                        127          106            135      
    dcd D.1.1.1.1                                            50 (37.3%)   42 (31.3%)     51 (38.6%)  
    dcd D.1.1.4.2                                            48 (35.8%)   42 (31.3%)     50 (37.9%)  
cl D.2                                                                                               
  Total number of patients with at least one adverse event   47 (35.1%)   58 (43.3%)     57 (43.2%)  
  Total number of events                                         62           72             74      
    dcd D.2.1.5.3                                            47 (35.1%)   58 (43.3%)     57 (43.2%)  

this is not looking so good. So we can try to add some indentions to make it look better

basic_table() %>%
  split_cols_by("ACTARM") %>%
  split_rows_by("AEBODSYS", child_labels = "visible") %>%
  summarize_num_patients("USUBJID",
      .stats = c("unique", "nonunique"),
      .labels = c("Total number of patients with at least one adverse event", "Total number of events")) %>%
  count_occurrences("AEDECOD", .indent_mods = -1L) %>%
  build_table(ex_adae, alt_counts_df = ex_adsl)

it looks much nicer now (although with this not so nice -1L indention in layout definition)

                                                             A: Drug X    B: Placebo   C: Combination
—————————————————————————————————————————————————————————————————————————————————————————————————————
cl A.1                                                                                               
  Total number of patients with at least one adverse event   78 (58.2%)   75 (56.0%)     89 (67.4%)  
  Total number of events                                        132          130            160      
  dcd A.1.1.1.1                                              50 (37.3%)   45 (33.6%)     63 (47.7%)  
  dcd A.1.1.1.2                                              48 (35.8%)   48 (35.8%)     50 (37.9%)  
cl B.1                                                                                               
  Total number of patients with at least one adverse event   47 (35.1%)   49 (36.6%)     43 (32.6%)  
  Total number of events                                         56           60             62      
  dcd B.1.1.1.1                                              47 (35.1%)   49 (36.6%)     43 (32.6%)  
cl B.2                                                                                               
  Total number of patients with at least one adverse event   79 (59.0%)   74 (55.2%)     85 (64.4%)  
  Total number of events                                        129          138            143      
  dcd B.2.1.2.1                                              49 (36.6%)   44 (32.8%)     52 (39.4%)  
  dcd B.2.2.3.1                                              48 (35.8%)   54 (40.3%)     51 (38.6%)  
cl C.1                                                                                               
  Total number of patients with at least one adverse event   43 (32.1%)   46 (34.3%)     43 (32.6%)  
  Total number of events                                         55           63             64      
  dcd C.1.1.1.3                                              43 (32.1%)   46 (34.3%)     43 (32.6%)  
cl C.2                                                                                               
  Total number of patients with at least one adverse event   35 (26.1%)   48 (35.8%)     55 (41.7%)  
  Total number of events                                         48           53             65      
  dcd C.2.1.2.1                                              35 (26.1%)   48 (35.8%)     55 (41.7%)  
cl D.1                                                                                               
  Total number of patients with at least one adverse event   79 (59.0%)   67 (50.0%)     80 (60.6%)  
  Total number of events                                        127          106            135      
  dcd D.1.1.1.1                                              50 (37.3%)   42 (31.3%)     51 (38.6%)  
  dcd D.1.1.4.2                                              48 (35.8%)   42 (31.3%)     50 (37.9%)  
cl D.2                                                                                               
  Total number of patients with at least one adverse event   47 (35.1%)   58 (43.3%)     57 (43.2%)  
  Total number of events                                         62           72             74      
  dcd D.2.1.5.3                                              47 (35.1%)   58 (43.3%)     57 (43.2%)  

but later someone want to use this table to filter out some rows ( with difference larger than 5% between any of the arms) to filter the arm, we also need to remove the "content" rows because the number is misleading (that is still total number of all ae, not those with difference larger than 5%)

so with the following code

criteria_fun <- function(tr) is(tr, "ContentRow")
row_condition <- has_fractions_difference(atleast = 0.05)

basic_table() %>%
  split_cols_by("ACTARM") %>%
  split_rows_by("AEBODSYS", child_labels = "visible") %>%
  summarize_num_patients("USUBJID",
      .stats = c("unique", "nonunique"),
      .labels = c("Total number of patients with at least one adverse event", "Total number of events")) %>%
  count_occurrences("AEDECOD", .indent_mods = -1L) %>%
  build_table(ex_adae, alt_counts_df = ex_adsl) %>%
  trim_rows(criteria = criteria_fun) %>%
  prune_table(keep_rows(row_condition))

and we get

                A: Drug X    B: Placebo   C: Combination
————————————————————————————————————————————————————————
cl A.1                                                  
dcd A.1.1.1.1   50 (37.3%)   45 (33.6%)     63 (47.7%)  
cl B.2                                                  
dcd B.2.1.2.1   49 (36.6%)   44 (32.8%)     52 (39.4%)  
cl C.2                                                  
dcd C.2.1.2.1   35 (26.1%)   48 (35.8%)     55 (41.7%)  
cl D.1                                                  
dcd D.1.1.1.1   50 (37.3%)   42 (31.3%)     51 (38.6%)  
dcd D.1.1.4.2   48 (35.8%)   42 (31.3%)     50 (37.9%)  
cl D.2                                                  
dcd D.2.1.5.3   47 (35.1%)   58 (43.3%)     57 (43.2%)

what happened? why the indentions are all gone?

the fact is that the content rows are removed and now the child rows hangs directly under the label row. the -1L indention, still takes effect.

You may think that some sort of new layout is needed to achieve this. And I agree that this can be achieved through new layouts.

But I still find that something could have been achieved through some post processing, now requires a new layout, is not satisfactory.

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct.

Contribution Guidelines

  • [X] I agree to follow this project's Contribution Guidelines.

Security Policy

  • [X] I agree to follow this project's Security Policy.

clarkliming avatar Jun 15 '23 13:06 clarkliming

Hi @clarkliming,

The issue is if we have absolute indents, such as (this wont work, but suppose it did, as a sort of pseudocode)


basic_table() %>%
  split_rows_by("AEBODSYS") %>%
  summarize_row_groups(...) %>%
  analyze("AEDECOD", mean, abs_indent = 1) %>%
  build_table(DM)

To get your table, then the layouting code becomes really brittle. If you want to add nesting (e.g., you want to do this for each country, normally you'd do

basic_table() %>%
  split_rows_by("COUNTRY") %>%
  split_rows_by("AEBODSYS") %>%
  summarize_row_groups(...) %>%
  analyze("AEDECOD", ..., abs_indent = 1) %>%
  build_table(DM)

Which is a very straightforward extension of the table. But now the indent is wrong.

The larger issue is that thats not really a reasonable post-processing activity, in my opinion. Its much easier and more straightforward to write the layout for the table you want, rather than making a table with a huge amount more complexity and then forcefully stripping out structure from it.

The layout here would just be

basic_table() %>%
  split_rows_by("AEBODSYS") %>%
  analyze("AEDECOD", ...) %>%
  build_table(DM)

if you're int he realm of automation, its much easier to conditionally add the summaries only if you need them, and control the indent mod at the same time

myfun <- function(data, ..., important_arg) {
  a_imod <- if(important_arg) -1L else 0L
  basic_table() %>%
    split_rows_by("AEBODSYS") %>%
    (\(lyt) if(important_arg) summarize_row_groups(...) else lyt) %>%
    analyze("AEDECOD", ..., indent_mod = a_imod) %>%
    build_table(DM)
}

The above is easier and safer than the trim -> prune approach, and works within the existing indent_mod framework

gmbecker avatar Jun 15 '23 17:06 gmbecker

I am asking because users are using https://docs.roche.com/#/tlg-catalog/devel/tables/adverse-events/aet02.html to do add some "post processing" to create tables like that. the template, is not defined from our side, but removing the content rows is needed (the table need to be ordered by SOC first but if not content row available you can not get the result). So, we must have these "summarize num patients" calls to sort, and we need to remove them later. Since they are in the same template, I am not aware of how users would use this table.

I totally understand that, we can use some other argument in the layout to do this. The proposed work around is already adopted from our side. However, this still does not solve the issue.

I am not saying that we should have some "absolute indention". I agree that, the "relative indention" is the correct way of handling this issue. But my question is, do we need, really have that many indetion modifier needed?

just look at tlg-catalog image in 28 files we have 129 indent modifiers!

clarkliming avatar Jun 16 '23 02:06 clarkliming

So here is what I propose to add a join tree/table function, i.e.

consider a labeld newick form of the tree structures.

T1: ((aaa)int1, (ccc)int2)root; T2: ((a, b)int1, (c, d)int2)root;

after joining T1 and T2, the new tree becomes ((aaa, a, b)int1, (ccc, c, d)int2)root;

with the same level of children node, they will have the same levels of indentation

shajoezhu avatar Jun 16 '23 08:06 shajoezhu

Ah, see the issue.

I am asking because users are using https://docs.roche.com/#/tlg-catalog/devel/tables/adverse-events/aet02.html to do add some "post processing" to create tables like that. the template, is not defined from our side, but removing the content rows is needed (the table need to be ordered by SOC first but if not content row available you can not get the result). So, we must have these "summarize num patients" calls to sort, and we need to remove them later. Since they are in the same template, I am not aware of how users would use this table.

This isn't correct. We can (and should) sort the levels within the "post-processing" portion of a custom split function without creating content rows that we do not want.

You don't specify exactly what you want to be sorting on, so i'm going to assume number of AEs total, but this is easily generalizable to any criterion calculable based on the data subsets of the individual panels:

level_score_fun <- nrow

order_facets <- function(score_fun = nrow) {
    function(ret, spl, .spl_context, fulldf) {
        scores <- vapply(ret$datasplit, score_fun, 1)
        o <- order(scores, decreasing = TRUE)
        make_split_result(values = ret$values[o],
                          datasplit = ret$datasplit[o],
                          labels = ret$labels[o])
    }
}


sorted_facet_splf <- make_split_fun(post = list(order_facets(level_score_fun)))

lyt <- basic_table() %>%
    split_cols_by("ARM") %>%
    split_rows_by("AEBODSYS", split_fun = sorted_facet_splf) %>%
    summarize_row_groups() %>%
    analyze("AEDECOD", afun = function(x, ...) {x <- droplevels(x); simple_analysis(x)})

gives us

> build_table(lyt, ex_adae)
                   A: Drug X    B: Placebo    C: Combination
————————————————————————————————————————————————————————————
cl A.1            132 (21.7%)   130 (20.9%)    160 (22.8%)  
  dcd A.1.1.1.1       64            62              88      
  dcd A.1.1.1.2       68            68              72      
cl B.2            129 (21.2%)   138 (22.2%)    143 (20.3%)  
  dcd B.2.1.2.1       65            62              66      
  dcd B.2.2.3.1       64            76              77      
cl D.1            127 (20.9%)   106 (17.0%)    135 (19.2%)  
  dcd D.1.1.1.1       61            51              71      
  dcd D.1.1.4.2       66            55              64      
cl D.2            62 (10.2%)    72 (11.6%)      74 (10.5%)  
  dcd D.2.1.5.3       62            72              74      
cl C.1             55 (9.0%)    63 (10.1%)      64 (9.1%)   
  dcd C.1.1.1.3       55            63              64      
cl B.1             56 (9.2%)     60 (9.6%)      62 (8.8%)   
  dcd B.1.1.1.1       56            60              62      
cl C.2             48 (7.9%)     53 (8.5%)      65 (9.2%)   
  dcd C.2.1.2.1       48            53              65      

No post-process sorting required.

Now I didn't bother tracking down the counting unique patients logic here, as you can see, but you can also see that order_facets (which I likely will be adding to rtables in response to this issue) is completely general and takes a score function.

Also note, i included the content function simply so you could see that the ordering was working. The actual layout you'od want, then, would be analogous to:

lyt <- basic_table() %>%
    split_cols_by("ARM") %>%
    split_rows_by("AEBODSYS", split_fun = sorted_facet_splf) %>%
    analyze("AEDECOD", afun = function(x, ...) {x <- droplevels(x); simple_analysis(x)})

Which gives

> build_table(lyt, ex_adae)
                  A: Drug X   B: Placebo   C: Combination
—————————————————————————————————————————————————————————
cl A.1                                                   
  dcd A.1.1.1.1      64           62             88      
  dcd A.1.1.1.2      68           68             72      
cl B.2                                                   
  dcd B.2.1.2.1      65           62             66      
  dcd B.2.2.3.1      64           76             77      
cl D.1                                                   
  dcd D.1.1.1.1      61           51             71      
  dcd D.1.1.4.2      66           55             64      
cl D.2                                                   
  dcd D.2.1.5.3      62           72             74      
cl C.1                                                   
  dcd C.1.1.1.3      55           63             64      
cl B.1                                                   
  dcd B.1.1.1.1      56           60             62      
cl C.2                                                   
  dcd C.2.1.2.1      48           53             65      

As desired.

Another extremely lo-fi solution here is to relevel the AEBODSYS factor within the data by occurance before calling build_table, but I think the above solution is cleaner and more general.

gmbecker avatar Jun 19 '23 17:06 gmbecker

@shajoezhu please have a look at the provided solution by Gabe and help evaluate the correct way to create AET02 with difference table

clarkliming avatar Jun 20 '23 01:06 clarkliming

Copied from the chat: I checked again the original question and I think if you decide in the last table to lose the summary (total number of patients/events) it is enough you lose summary_row_groups and the indentation specific and it is done. This is not related to the filtering per se right? I think the latter should not cause any problem with the indentation. If this is a viable solution for you I start to think that doing a smart post-processing pruning of the structure seems a bit unnecessary to me.

Maybe it is still possible to adapt the pruning for this case, in which one node is lost without re-updating the indentation. Still, if you keep the node would be hard to do further postprocessing meaningfully, if you re-update the indentation, I am quite sure there are cases in which you do not want this in the opposite way.

While writing this I thought about a solution that just would get rid of the summarize and use another row for this. That would solve indentation and content-row-related stuff (maybe if you want this to be repeated it could be a problem but it is down the line). Keeping you posted ;)

I honestly think this is more a problem to be solved in tern rather than in rtables. Opening an issue there.

Melkiades avatar Jul 04 '23 08:07 Melkiades

@clarkliming @shajoezhu @Melkiades can this issue be closed?

gmbecker avatar Aug 11 '23 23:08 gmbecker

@clarkliming @shajoezhu @Melkiades can this issue be closed?

I think so. This is a problem for tern. We need to get rid of leaves with summarize_row_groups

Melkiades avatar Aug 14 '23 14:08 Melkiades

hi @Teninq , can you check this please if you could use Gabe's suggestion and make the implemation in your table. Thanks!

https://github.com/insightsengineering/rtables/issues/663#issuecomment-1597529482

shajoezhu avatar Aug 14 '23 15:08 shajoezhu

hi @Teninq , can you check this please if you could use Gabe's suggestion and make the implemation in your table. Thanks!

#663 (comment)

I think the solution should be on the tern side (https://github.com/insightsengineering/rtables/issues/679) and it still needs to be completed

Melkiades avatar Aug 14 '23 17:08 Melkiades