Feature Request: Add the possibility to keep initial sorting order to `restrict_derivation`
Feature Idea
restrict_derivation should allow the user to keep the initial sorting order of the initial dataset.
For now this is not the case:
library(dplyr)
library(rlang)
library(admiral)
library(admiraldev)
adlb <- tribble(
~USUBJID, ~AVISITN, ~AVAL, ~ABLFL,
"1", -1, 113, NA_character_,
"1", 0, 113, "Y",
"1", 3, 117, NA_character_,
"2", 0, 95, "Y",
"3", 0, 111, "Y",
"3", 1, 101, NA_character_,
"3", 2, 123, NA_character_
)
result_1 <-
restrict_derivation(
adlb,
derivation = derive_var_base,
args = params(by_vars = exprs(USUBJID)),
filter = AVISITN >= 0
)
adlb
result_1
Relevant Input
Add an argument to restrict_derivation, for instance keep_order.
Relevant Output
Using above example, with an argument to keep the initial sorting order, the result should look like this:
# A tibble: 7 × 5
USUBJID AVISITN AVAL ABLFL BASE
<chr> <dbl> <dbl> <chr> <dbl>
1 1 -1 113 NA NA
2 1 0 113 Y 113
3 1 3 117 NA 113
4 2 0 95 Y 95
5 3 0 111 Y 111
6 3 1 101 NA 111
7 3 2 123 NA 111
Reproducible Example/Pseudo Code
restrict_derivation_sorted <- function(dataset,
derivation,
args = NULL,
filter,
keep_order = TRUE) {
# Check input
assert_data_frame(dataset)
assert_s3_class(args, "params", optional = TRUE)
assert_function(derivation, names(args))
filter <- assert_filter_cond(enexpr(filter))
# Create row_index if to sort after bind_rows
if (keep_order == TRUE) {
dataset <- dataset %>% mutate(ln = row_number())
}
# Split input dataset
data_ignore <- dataset %>%
filter(!(!!filter) | [is.na](https://url.de.m.mimecastprotect.com/s/4xnkC99JMYtZ527glSo5Ta2?domain=is.na)(!!filter))
data_derive <- dataset %>%
filter(!!filter)
# Call derivation on subset
call <- as.call(c(substitute(derivation), c(quote(data_derive), args)))
data_derive <-
eval(
call,
envir = list(data_derive = data_derive),
enclos = parent.frame()
)
# Put datasets together again
dataset_final <- bind_rows(data_derive, data_ignore)
# Arrange by row number if requested
if (keep_order == TRUE) {
dataset_final <- dataset_final %>%
arrange(ln) %>%
select(-ln)
}
return(dataset_final)
}
result_2 <-
restrict_derivation_sorted(
adlb,
derivation = derive_var_base,
args = params(by_vars = exprs(USUBJID)),
filter = AVISITN >= 0,
keep_order = TRUE
)
The admiral functions don't attempt to preserve the sort order. Thus restrict_derivation() is not the only function where the sorting of the input and output may differ. I would not like to add the keep_order argument to all admiral functions because this is a lot of work for a feature which is most likely not often used. The results of the admiral function is independent of the sort order of the input dataset. So usually there is no need to preserve the sort order (and as sorting is an expensive operation, I would avoid it).
I wonder if we should add two helper functions instead of adding the keep_order argument to all admiral functions. Something like:
sort_order <- get_sort_order(adlb)
adlb <- adlb %>%
derive_*() %>%
...
adlb <- restore_sort_order(adlb, sort_order)
@pharmaverse/admiral , what do you think?
Thanks for the issue @aassuied-ps. I agree with @bundfussr that we make it a point to not sort output datasets, especially because our code is generally ran within long pipe chains where these sorting steps would be superfluous.
I'm not sure about adding the helper functions. I feel like they may bloat the package, so I would only add if there is real need for them.
Understood, thank you for the answer : )
The admiral functions don't attempt to preserve the sort order. Thus
restrict_derivation()is not the only function where the sorting of the input and output may differ. I would not like to add thekeep_orderargument to all admiral functions because this is a lot of work for a feature which is most likely not often used. The results of the admiral function is independent of the sort order of the input dataset. So usually there is no need to preserve the sort order (and as sorting is an expensive operation, I would avoid it).I wonder if we should add two helper functions instead of adding the
keep_orderargument to all admiral functions. Something like:sort_order <- get_sort_order(adlb) adlb <- adlb %>% derive_*() %>% ... adlb <- restore_sort_order(adlb, sort_order)@pharmaverse/admiral , what do you think?
@bundfussr Are you proposing a function that figures out how the data is sorted, stores that information and then a user can use it later on?
If a user knows the preferred sort order, why can't they just use arrange()...or is the assumption here that the user does not know the sort order beforehand or is not able to determine it?
Apologies, just trying to understand the example and use case.
@bundfussr Are you proposing a function that figures out how the data is sorted, stores that information and then a user can use it later on?
Yes, if we want to provide such functionality, I would implement it as separate helper functions. However, like Edoardo I'm not sure if there is a need for it.
If a user knows the preferred sort order, why can't they just use
arrange()...or is the assumption here that the user does not know the sort order beforehand or is not able to determine it?
Yes, if the users know the preferred sort order, they can just use arrange(). The helper functions would be convenient in cases where the sort order is unknown or where the sort order should be forced in several places.
@bundfussr Are you proposing a function that figures out how the data is sorted, stores that information and then a user can use it later on?
Yes, if we want to provide such functionality, I would implement it as separate helper functions. However, like Edoardo I'm not sure if there is a need for it.
If a user knows the preferred sort order, why can't they just use
arrange()...or is the assumption here that the user does not know the sort order beforehand or is not able to determine it?Yes, if the users know the preferred sort order, they can just use
arrange(). The helper functions would be convenient in cases where the sort order is unknown or where the sort order should be forced in several places.
I think that the arrange() function should be sufficient. Typically, programmers have a specific order in mind.
We could ask the community if this is a feature people need, but I suggest we close this issue.
As an aside: Is there some sort of feature voting for admiral somewhere, where the community can vote on requested features?
@bundfussr Are you proposing a function that figures out how the data is sorted, stores that information and then a user can use it later on?
Yes, if we want to provide such functionality, I would implement it as separate helper functions. However, like Edoardo I'm not sure if there is a need for it.
If a user knows the preferred sort order, why can't they just use
arrange()...or is the assumption here that the user does not know the sort order beforehand or is not able to determine it?Yes, if the users know the preferred sort order, they can just use
arrange(). The helper functions would be convenient in cases where the sort order is unknown or where the sort order should be forced in several places.
ooohhh I would actually love this!! Be nice to pull this from raw data and check with what is being said is the sort order in the specs. Can see other use cases!
Shall we close this and make a more generalized function request?
I would recommend not to add anything to admiral for this. we shouldn't add things already covered by tidyverse and also {metatools} provides https://pharmaverse.github.io/metatools/reference/sort_by_key.html which is a nice final sorting step based on your metadata primary keys.
I would recommend not to add anything to admiral for this. we shouldn't add things already covered by tidyverse and also {metatools} provides https://pharmaverse.github.io/metatools/reference/sort_by_key.html which is a nice final sorting step based on your metadata primary keys.
My understanding is the new function is meant to pull out the how the data is actually sorted versus a function that applies a sort. I'm keen on this - perhaps we could develop here and push to metatools if we think it doesn't belong.
I would recommend not to add anything to admiral for this. we shouldn't add things already covered by tidyverse and also {metatools} provides https://pharmaverse.github.io/metatools/reference/sort_by_key.html which is a nice final sorting step based on your metadata primary keys.
My understanding is the new function is meant to pull out the how the data is actually sorted versus a function that applies a sort. I'm keen on this - perhaps we could develop here and push to metatools if we think it doesn't belong.
The proposed functions don't use any metadata. Thus I think they shouldn't be in metatools.
ah ok, thanks! my misunderstanding then. i had read it as we were trying to make our own sorting function.