rtables
rtables copied to clipboard
[Question]: reconsider the levels of analyze, summarize and split_labels
What is your question?
the indention of contents is a complicated thing I agree. But I do think we need to reconsider this if we have the bandwidth. Here is an example: ae table is needed
basic_table() %>%
split_cols_by("ACTARM") %>%
split_rows_by("AEBODSYS", child_labels = "visible") %>%
summarize_num_patients("USUBJID",
.stats = c("unique", "nonunique"),
.labels = c("Total number of patients with at least one adverse event", "Total number of events")) %>%
count_occurrences("AEDECOD") %>%
build_table(ex_adae, alt_counts_df = ex_adsl)
we get
A: Drug X B: Placebo C: Combination
—————————————————————————————————————————————————————————————————————————————————————————————————————
cl A.1
Total number of patients with at least one adverse event 78 (58.2%) 75 (56.0%) 89 (67.4%)
Total number of events 132 130 160
dcd A.1.1.1.1 50 (37.3%) 45 (33.6%) 63 (47.7%)
dcd A.1.1.1.2 48 (35.8%) 48 (35.8%) 50 (37.9%)
cl B.1
Total number of patients with at least one adverse event 47 (35.1%) 49 (36.6%) 43 (32.6%)
Total number of events 56 60 62
dcd B.1.1.1.1 47 (35.1%) 49 (36.6%) 43 (32.6%)
cl B.2
Total number of patients with at least one adverse event 79 (59.0%) 74 (55.2%) 85 (64.4%)
Total number of events 129 138 143
dcd B.2.1.2.1 49 (36.6%) 44 (32.8%) 52 (39.4%)
dcd B.2.2.3.1 48 (35.8%) 54 (40.3%) 51 (38.6%)
cl C.1
Total number of patients with at least one adverse event 43 (32.1%) 46 (34.3%) 43 (32.6%)
Total number of events 55 63 64
dcd C.1.1.1.3 43 (32.1%) 46 (34.3%) 43 (32.6%)
cl C.2
Total number of patients with at least one adverse event 35 (26.1%) 48 (35.8%) 55 (41.7%)
Total number of events 48 53 65
dcd C.2.1.2.1 35 (26.1%) 48 (35.8%) 55 (41.7%)
cl D.1
Total number of patients with at least one adverse event 79 (59.0%) 67 (50.0%) 80 (60.6%)
Total number of events 127 106 135
dcd D.1.1.1.1 50 (37.3%) 42 (31.3%) 51 (38.6%)
dcd D.1.1.4.2 48 (35.8%) 42 (31.3%) 50 (37.9%)
cl D.2
Total number of patients with at least one adverse event 47 (35.1%) 58 (43.3%) 57 (43.2%)
Total number of events 62 72 74
dcd D.2.1.5.3 47 (35.1%) 58 (43.3%) 57 (43.2%)
this is not looking so good. So we can try to add some indentions to make it look better
basic_table() %>%
split_cols_by("ACTARM") %>%
split_rows_by("AEBODSYS", child_labels = "visible") %>%
summarize_num_patients("USUBJID",
.stats = c("unique", "nonunique"),
.labels = c("Total number of patients with at least one adverse event", "Total number of events")) %>%
count_occurrences("AEDECOD", .indent_mods = -1L) %>%
build_table(ex_adae, alt_counts_df = ex_adsl)
it looks much nicer now (although with this not so nice -1L indention in layout definition)
A: Drug X B: Placebo C: Combination
—————————————————————————————————————————————————————————————————————————————————————————————————————
cl A.1
Total number of patients with at least one adverse event 78 (58.2%) 75 (56.0%) 89 (67.4%)
Total number of events 132 130 160
dcd A.1.1.1.1 50 (37.3%) 45 (33.6%) 63 (47.7%)
dcd A.1.1.1.2 48 (35.8%) 48 (35.8%) 50 (37.9%)
cl B.1
Total number of patients with at least one adverse event 47 (35.1%) 49 (36.6%) 43 (32.6%)
Total number of events 56 60 62
dcd B.1.1.1.1 47 (35.1%) 49 (36.6%) 43 (32.6%)
cl B.2
Total number of patients with at least one adverse event 79 (59.0%) 74 (55.2%) 85 (64.4%)
Total number of events 129 138 143
dcd B.2.1.2.1 49 (36.6%) 44 (32.8%) 52 (39.4%)
dcd B.2.2.3.1 48 (35.8%) 54 (40.3%) 51 (38.6%)
cl C.1
Total number of patients with at least one adverse event 43 (32.1%) 46 (34.3%) 43 (32.6%)
Total number of events 55 63 64
dcd C.1.1.1.3 43 (32.1%) 46 (34.3%) 43 (32.6%)
cl C.2
Total number of patients with at least one adverse event 35 (26.1%) 48 (35.8%) 55 (41.7%)
Total number of events 48 53 65
dcd C.2.1.2.1 35 (26.1%) 48 (35.8%) 55 (41.7%)
cl D.1
Total number of patients with at least one adverse event 79 (59.0%) 67 (50.0%) 80 (60.6%)
Total number of events 127 106 135
dcd D.1.1.1.1 50 (37.3%) 42 (31.3%) 51 (38.6%)
dcd D.1.1.4.2 48 (35.8%) 42 (31.3%) 50 (37.9%)
cl D.2
Total number of patients with at least one adverse event 47 (35.1%) 58 (43.3%) 57 (43.2%)
Total number of events 62 72 74
dcd D.2.1.5.3 47 (35.1%) 58 (43.3%) 57 (43.2%)
but later someone want to use this table to filter out some rows ( with difference larger than 5% between any of the arms) to filter the arm, we also need to remove the "content" rows because the number is misleading (that is still total number of all ae, not those with difference larger than 5%)
so with the following code
criteria_fun <- function(tr) is(tr, "ContentRow")
row_condition <- has_fractions_difference(atleast = 0.05)
basic_table() %>%
split_cols_by("ACTARM") %>%
split_rows_by("AEBODSYS", child_labels = "visible") %>%
summarize_num_patients("USUBJID",
.stats = c("unique", "nonunique"),
.labels = c("Total number of patients with at least one adverse event", "Total number of events")) %>%
count_occurrences("AEDECOD", .indent_mods = -1L) %>%
build_table(ex_adae, alt_counts_df = ex_adsl) %>%
trim_rows(criteria = criteria_fun) %>%
prune_table(keep_rows(row_condition))
and we get
A: Drug X B: Placebo C: Combination
————————————————————————————————————————————————————————
cl A.1
dcd A.1.1.1.1 50 (37.3%) 45 (33.6%) 63 (47.7%)
cl B.2
dcd B.2.1.2.1 49 (36.6%) 44 (32.8%) 52 (39.4%)
cl C.2
dcd C.2.1.2.1 35 (26.1%) 48 (35.8%) 55 (41.7%)
cl D.1
dcd D.1.1.1.1 50 (37.3%) 42 (31.3%) 51 (38.6%)
dcd D.1.1.4.2 48 (35.8%) 42 (31.3%) 50 (37.9%)
cl D.2
dcd D.2.1.5.3 47 (35.1%) 58 (43.3%) 57 (43.2%)
what happened? why the indentions are all gone?
the fact is that the content rows are removed and now the child rows hangs directly under the label row. the -1L indention, still takes effect.
You may think that some sort of new layout is needed to achieve this. And I agree that this can be achieved through new layouts.
But I still find that something could have been achieved through some post processing, now requires a new layout, is not satisfactory.
Code of Conduct
- [X] I agree to follow this project's Code of Conduct.
Contribution Guidelines
- [X] I agree to follow this project's Contribution Guidelines.
Security Policy
- [X] I agree to follow this project's Security Policy.
Hi @clarkliming,
The issue is if we have absolute indents, such as (this wont work, but suppose it did, as a sort of pseudocode)
basic_table() %>%
split_rows_by("AEBODSYS") %>%
summarize_row_groups(...) %>%
analyze("AEDECOD", mean, abs_indent = 1) %>%
build_table(DM)
To get your table, then the layouting code becomes really brittle. If you want to add nesting (e.g., you want to do this for each country, normally you'd do
basic_table() %>%
split_rows_by("COUNTRY") %>%
split_rows_by("AEBODSYS") %>%
summarize_row_groups(...) %>%
analyze("AEDECOD", ..., abs_indent = 1) %>%
build_table(DM)
Which is a very straightforward extension of the table. But now the indent is wrong.
The larger issue is that thats not really a reasonable post-processing activity, in my opinion. Its much easier and more straightforward to write the layout for the table you want, rather than making a table with a huge amount more complexity and then forcefully stripping out structure from it.
The layout here would just be
basic_table() %>%
split_rows_by("AEBODSYS") %>%
analyze("AEDECOD", ...) %>%
build_table(DM)
if you're int he realm of automation, its much easier to conditionally add the summaries only if you need them, and control the indent mod at the same time
myfun <- function(data, ..., important_arg) {
a_imod <- if(important_arg) -1L else 0L
basic_table() %>%
split_rows_by("AEBODSYS") %>%
(\(lyt) if(important_arg) summarize_row_groups(...) else lyt) %>%
analyze("AEDECOD", ..., indent_mod = a_imod) %>%
build_table(DM)
}
The above is easier and safer than the trim -> prune approach, and works within the existing indent_mod framework
I am asking because users are using https://docs.roche.com/#/tlg-catalog/devel/tables/adverse-events/aet02.html to do add some "post processing" to create tables like that. the template, is not defined from our side, but removing the content rows is needed (the table need to be ordered by SOC first but if not content row available you can not get the result). So, we must have these "summarize num patients" calls to sort, and we need to remove them later. Since they are in the same template, I am not aware of how users would use this table.
I totally understand that, we can use some other argument in the layout to do this. The proposed work around is already adopted from our side. However, this still does not solve the issue.
I am not saying that we should have some "absolute indention". I agree that, the "relative indention" is the correct way of handling this issue. But my question is, do we need, really have that many indetion modifier needed?
just look at tlg-catalog
in 28 files we have 129 indent modifiers!
So here is what I propose to add a join tree/table function, i.e.
consider a labeld newick form of the tree structures.
T1: ((aaa)int1, (ccc)int2)root; T2: ((a, b)int1, (c, d)int2)root;
after joining T1 and T2, the new tree becomes ((aaa, a, b)int1, (ccc, c, d)int2)root;
with the same level of children node, they will have the same levels of indentation
Ah, see the issue.
I am asking because users are using https://docs.roche.com/#/tlg-catalog/devel/tables/adverse-events/aet02.html to do add some "post processing" to create tables like that. the template, is not defined from our side, but removing the content rows is needed (the table need to be ordered by SOC first but if not content row available you can not get the result). So, we must have these "summarize num patients" calls to sort, and we need to remove them later. Since they are in the same template, I am not aware of how users would use this table.
This isn't correct. We can (and should) sort the levels within the "post-processing" portion of a custom split function without creating content rows that we do not want.
You don't specify exactly what you want to be sorting on, so i'm going to assume number of AEs total, but this is easily generalizable to any criterion calculable based on the data subsets of the individual panels:
level_score_fun <- nrow
order_facets <- function(score_fun = nrow) {
function(ret, spl, .spl_context, fulldf) {
scores <- vapply(ret$datasplit, score_fun, 1)
o <- order(scores, decreasing = TRUE)
make_split_result(values = ret$values[o],
datasplit = ret$datasplit[o],
labels = ret$labels[o])
}
}
sorted_facet_splf <- make_split_fun(post = list(order_facets(level_score_fun)))
lyt <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("AEBODSYS", split_fun = sorted_facet_splf) %>%
summarize_row_groups() %>%
analyze("AEDECOD", afun = function(x, ...) {x <- droplevels(x); simple_analysis(x)})
gives us
> build_table(lyt, ex_adae)
A: Drug X B: Placebo C: Combination
————————————————————————————————————————————————————————————
cl A.1 132 (21.7%) 130 (20.9%) 160 (22.8%)
dcd A.1.1.1.1 64 62 88
dcd A.1.1.1.2 68 68 72
cl B.2 129 (21.2%) 138 (22.2%) 143 (20.3%)
dcd B.2.1.2.1 65 62 66
dcd B.2.2.3.1 64 76 77
cl D.1 127 (20.9%) 106 (17.0%) 135 (19.2%)
dcd D.1.1.1.1 61 51 71
dcd D.1.1.4.2 66 55 64
cl D.2 62 (10.2%) 72 (11.6%) 74 (10.5%)
dcd D.2.1.5.3 62 72 74
cl C.1 55 (9.0%) 63 (10.1%) 64 (9.1%)
dcd C.1.1.1.3 55 63 64
cl B.1 56 (9.2%) 60 (9.6%) 62 (8.8%)
dcd B.1.1.1.1 56 60 62
cl C.2 48 (7.9%) 53 (8.5%) 65 (9.2%)
dcd C.2.1.2.1 48 53 65
No post-process sorting required.
Now I didn't bother tracking down the counting unique patients logic here, as you can see, but you can also see that order_facets
(which I likely will be adding to rtables in response to this issue) is completely general and takes a score function.
Also note, i included the content function simply so you could see that the ordering was working. The actual layout you'od want, then, would be analogous to:
lyt <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("AEBODSYS", split_fun = sorted_facet_splf) %>%
analyze("AEDECOD", afun = function(x, ...) {x <- droplevels(x); simple_analysis(x)})
Which gives
> build_table(lyt, ex_adae)
A: Drug X B: Placebo C: Combination
—————————————————————————————————————————————————————————
cl A.1
dcd A.1.1.1.1 64 62 88
dcd A.1.1.1.2 68 68 72
cl B.2
dcd B.2.1.2.1 65 62 66
dcd B.2.2.3.1 64 76 77
cl D.1
dcd D.1.1.1.1 61 51 71
dcd D.1.1.4.2 66 55 64
cl D.2
dcd D.2.1.5.3 62 72 74
cl C.1
dcd C.1.1.1.3 55 63 64
cl B.1
dcd B.1.1.1.1 56 60 62
cl C.2
dcd C.2.1.2.1 48 53 65
As desired.
Another extremely lo-fi solution here is to relevel the AEBODSYS factor within the data by occurance before calling build_table, but I think the above solution is cleaner and more general.
@shajoezhu please have a look at the provided solution by Gabe and help evaluate the correct way to create AET02 with difference table
Copied from the chat: I checked again the original question and I think if you decide in the last table to lose the summary (total number of patients/events) it is enough you lose summary_row_groups and the indentation specific and it is done. This is not related to the filtering per se right? I think the latter should not cause any problem with the indentation. If this is a viable solution for you I start to think that doing a smart post-processing pruning of the structure seems a bit unnecessary to me.
Maybe it is still possible to adapt the pruning for this case, in which one node is lost without re-updating the indentation. Still, if you keep the node would be hard to do further postprocessing meaningfully, if you re-update the indentation, I am quite sure there are cases in which you do not want this in the opposite way.
While writing this I thought about a solution that just would get rid of the summarize and use another row for this. That would solve indentation and content-row-related stuff (maybe if you want this to be repeated it could be a problem but it is down the line). Keeping you posted ;)
I honestly think this is more a problem to be solved in tern rather than in rtables. Opening an issue there.
@clarkliming @shajoezhu @Melkiades can this issue be closed?
@clarkliming @shajoezhu @Melkiades can this issue be closed?
I think so. This is a problem for tern
. We need to get rid of leaves with summarize_row_groups
hi @Teninq , can you check this please if you could use Gabe's suggestion and make the implemation in your table. Thanks!
https://github.com/insightsengineering/rtables/issues/663#issuecomment-1597529482
hi @Teninq , can you check this please if you could use Gabe's suggestion and make the implemation in your table. Thanks!
I think the solution should be on the tern
side (https://github.com/insightsengineering/rtables/issues/679) and it still needs to be completed