Tplyr
Tplyr copied to clipboard
Complex Filtering
Hi guys,
I'm attempting to use Tplyr to compute a group_count layer that I'm not sure how to specify. To give some background, I've simulated a partial adae table below that has USUBJID, ARM and AETOXGRN. AETOXGRN is a toxicity grading used frequently within Oncology and ranges from 1 to 5.
What I'm interested in counting is each subjects worst (i.e. highest) toxicity grades. I'm interested in distinct counts, so for example, if subject X had two AEs, one graded with AETOXGRN = 1, and another with AETOXGRN = 4, I'd like this subject to be counted in the "4" category only.
I can achieve this in dplyr, and also achieve this in Tplyr with some up-front filtering. However, I'm wondering if I can specify something like this directly in Tplyr.
Here is some code for my exploration.
library(dplyr)
library(Tplyr)
adae <- tibble::tribble(
~USUBJID, ~ARM, ~AETOXGRN,
1L, "Treatment", 3L,
1L, "Treatment", 1L,
1L, "Treatment", 2L,
1L, "Treatment", 3L,
1L, "Treatment", 1L,
2L, "Placebo", 3L,
2L, "Placebo", 3L,
2L, "Placebo", 4L,
2L, "Placebo", 5L,
2L, "Placebo", 4L,
2L, "Placebo", 2L,
3L, "Treatment", 1L,
4L, "Placebo", 1L,
5L, "Treatment", 1L,
5L, "Treatment", 1L,
5L, "Treatment", 5L,
5L, "Treatment", 3L,
5L, "Treatment", 2L,
5L, "Treatment", 4L,
5L, "Treatment", 1L
)
# using dplyr
adae %>%
group_by(USUBJID) %>%
arrange(desc(AETOXGRN)) %>%
slice(1) %>%
ungroup %>%
count(ARM, AETOXGRN)
# dplyr output
# A tibble: 5 x 3
# ARM AETOXGRN n
# <chr> <int> <int>
# Placebo 1 1
# Placebo 5 1
# Treatment 1 1
# Treatment 3 1
# Treatment 5 1
# Using Tplyr
t <- tplyr_table(adae, ARM) %>%
add_layer(
group_count(AETOXGRN, where = AETOXGRN == max(AETOXGRN)) %>%
set_distinct_by(USUBJID)
)
t %>% build()
# Tplyr output
# A tibble: 1 x 5
# row_label1 var1_Placebo var1_Treatment ord_layer_index ord_layer_1
# <chr> <chr> <chr> <int> <dbl>
# 5 1 (100.0%) 1 (100.0%) 1 5
I can see that Tplyr only outputs the result for the max(AETOXGRN) grade, 5, which looks correct. So it seems my filter is acting on a data set level rather than a per USUBJID level. Is there a good way to specify a where
filter of this nature or have I maybe missed other options in Tplyr?
Curious to hear any thoughts!
Thanks! Matt
@mattkumar thanks for submitting this!
Currently this wouldn't be possible because we don't really have a clean way to make groups pass down into where filter conditions are applied. We didn't really plan for that so it would take a good bit of thought for how to do it elegantly. Like I'm almost thinking that it would be safer to pre-derive a flag and use the flag, which is how ADaM datasets would typically set things up. Because grouping and ungrouping here is a bit tricky.
What do you think?