admiral Feature Request: Add the possibility to keep initial sorting order to `restrict

Feature Idea

restrict_derivation should allow the user to keep the initial sorting order of the initial dataset.

For now this is not the case:

library(dplyr)
library(rlang)
library(admiral)
library(admiraldev)

adlb <- tribble(
  ~USUBJID, ~AVISITN, ~AVAL, ~ABLFL,
  "1",            -1,   113, NA_character_,
  "1",             0,   113, "Y",
  "1",             3,   117, NA_character_,
  "2",             0,    95, "Y",
  "3",             0,   111, "Y",
  "3",             1,   101, NA_character_,
  "3",             2,   123, NA_character_
)

result_1 <-
  restrict_derivation(
    adlb,
    derivation = derive_var_base,
    args = params(by_vars = exprs(USUBJID)),
    filter = AVISITN >= 0
  )

adlb
result_1

Relevant Input

Add an argument to restrict_derivation, for instance keep_order.

Relevant Output

Using above example, with an argument to keep the initial sorting order, the result should look like this:

# A tibble: 7 × 5
  USUBJID AVISITN  AVAL ABLFL  BASE
  <chr>     <dbl> <dbl> <chr> <dbl>
1 1            -1   113 NA       NA
2 1             0   113 Y       113
3 1             3   117 NA      113
4 2             0    95 Y        95
5 3             0   111 Y       111
6 3             1   101 NA      111
7 3             2   123 NA      111

Reproducible Example/Pseudo Code

restrict_derivation_sorted <- function(dataset,
                                       derivation,
                                       args = NULL,
                                       filter,
                                       keep_order = TRUE) {
  # Check input
  assert_data_frame(dataset)
  assert_s3_class(args, "params", optional = TRUE)
  assert_function(derivation, names(args))
  filter <- assert_filter_cond(enexpr(filter))

  # Create row_index if to sort after bind_rows
  if (keep_order == TRUE) {
    dataset <- dataset %>% mutate(ln = row_number())
  }

  # Split input dataset
  data_ignore <- dataset %>%
    filter(!(!!filter) | [is.na](https://url.de.m.mimecastprotect.com/s/4xnkC99JMYtZ527glSo5Ta2?domain=is.na)(!!filter))
  data_derive <- dataset %>%
    filter(!!filter)

  # Call derivation on subset
  call <- as.call(c(substitute(derivation), c(quote(data_derive), args)))
  data_derive <-
    eval(
      call,
      envir = list(data_derive = data_derive),
      enclos = parent.frame()
    )

  # Put datasets together again
  dataset_final <- bind_rows(data_derive, data_ignore)

  # Arrange by row number if requested
  if (keep_order == TRUE) {
    dataset_final <- dataset_final %>% 
      arrange(ln) %>% 
      select(-ln)
  }

  return(dataset_final)
}

result_2 <-
  restrict_derivation_sorted(
    adlb,
    derivation = derive_var_base,
    args = params(by_vars = exprs(USUBJID)),
    filter = AVISITN >= 0,
    keep_order = TRUE
  )

Jul 22 '24 05:07 aassuied-ps

The admiral functions don't attempt to preserve the sort order. Thus restrict_derivation() is not the only function where the sorting of the input and output may differ. I would not like to add the keep_order argument to all admiral functions because this is a lot of work for a feature which is most likely not often used. The results of the admiral function is independent of the sort order of the input dataset. So usually there is no need to preserve the sort order (and as sorting is an expensive operation, I would avoid it).

I wonder if we should add two helper functions instead of adding the keep_order argument to all admiral functions. Something like:

sort_order <- get_sort_order(adlb)

adlb <- adlb %>%
  derive_*() %>%
  ...
  
adlb <- restore_sort_order(adlb, sort_order)

@pharmaverse/admiral , what do you think?

Jul 23 '24 14:07 bundfussr

Thanks for the issue @aassuied-ps. I agree with @bundfussr that we make it a point to not sort output datasets, especially because our code is generally ran within long pipe chains where these sorting steps would be superfluous.

I'm not sure about adding the helper functions. I feel like they may bloat the package, so I would only add if there is real need for them.

Jul 23 '24 14:07 manciniedoardo

Understood, thank you for the answer : )

Jul 25 '24 15:07 aassuied-ps

The admiral functions don't attempt to preserve the sort order. Thus restrict_derivation() is not the only function where the sorting of the input and output may differ. I would not like to add the keep_order argument to all admiral functions because this is a lot of work for a feature which is most likely not often used. The results of the admiral function is independent of the sort order of the input dataset. So usually there is no need to preserve the sort order (and as sorting is an expensive operation, I would avoid it).

I wonder if we should add two helper functions instead of adding the keep_order argument to all admiral functions. Something like:
sort_order <- get_sort_order(adlb)

adlb <- adlb %>%
  derive_*() %>%
  ...
  
adlb <- restore_sort_order(adlb, sort_order)
@pharmaverse/admiral , what do you think?

@bundfussr Are you proposing a function that figures out how the data is sorted, stores that information and then a user can use it later on?

If a user knows the preferred sort order, why can't they just use arrange()...or is the assumption here that the user does not know the sort order beforehand or is not able to determine it?

Apologies, just trying to understand the example and use case.

Jul 28 '24 00:07 bms63

@bundfussr Are you proposing a function that figures out how the data is sorted, stores that information and then a user can use it later on?

Yes, if we want to provide such functionality, I would implement it as separate helper functions. However, like Edoardo I'm not sure if there is a need for it.

If a user knows the preferred sort order, why can't they just use arrange()...or is the assumption here that the user does not know the sort order beforehand or is not able to determine it?

Yes, if the users know the preferred sort order, they can just use arrange(). The helper functions would be convenient in cases where the sort order is unknown or where the sort order should be forced in several places.

Aug 13 '24 13:08 bundfussr

@bundfussr Are you proposing a function that figures out how the data is sorted, stores that information and then a user can use it later on?

Yes, if we want to provide such functionality, I would implement it as separate helper functions. However, like Edoardo I'm not sure if there is a need for it.

If a user knows the preferred sort order, why can't they just use arrange()...or is the assumption here that the user does not know the sort order beforehand or is not able to determine it?

Yes, if the users know the preferred sort order, they can just use arrange(). The helper functions would be convenient in cases where the sort order is unknown or where the sort order should be forced in several places.

I think that the arrange() function should be sufficient. Typically, programmers have a specific order in mind. We could ask the community if this is a feature people need, but I suggest we close this issue.

As an aside: Is there some sort of feature voting for admiral somewhere, where the community can vote on requested features?

Aug 13 '24 13:08 StefanThoma

@bundfussr Are you proposing a function that figures out how the data is sorted, stores that information and then a user can use it later on?

Yes, if we want to provide such functionality, I would implement it as separate helper functions. However, like Edoardo I'm not sure if there is a need for it.

If a user knows the preferred sort order, why can't they just use arrange()...or is the assumption here that the user does not know the sort order beforehand or is not able to determine it?

Yes, if the users know the preferred sort order, they can just use arrange(). The helper functions would be convenient in cases where the sort order is unknown or where the sort order should be forced in several places.

ooohhh I would actually love this!! Be nice to pull this from raw data and check with what is being said is the sort order in the specs. Can see other use cases!

Shall we close this and make a more generalized function request?

Aug 19 '24 00:08 bms63

I would recommend not to add anything to admiral for this. we shouldn't add things already covered by tidyverse and also {metatools} provides https://pharmaverse.github.io/metatools/reference/sort_by_key.html which is a nice final sorting step based on your metadata primary keys.

Aug 20 '24 07:08 rossfarrugia

I would recommend not to add anything to admiral for this. we shouldn't add things already covered by tidyverse and also {metatools} provides https://pharmaverse.github.io/metatools/reference/sort_by_key.html which is a nice final sorting step based on your metadata primary keys.

My understanding is the new function is meant to pull out the how the data is actually sorted versus a function that applies a sort. I'm keen on this - perhaps we could develop here and push to metatools if we think it doesn't belong.

Aug 20 '24 11:08 bms63

I would recommend not to add anything to admiral for this. we shouldn't add things already covered by tidyverse and also {metatools} provides https://pharmaverse.github.io/metatools/reference/sort_by_key.html which is a nice final sorting step based on your metadata primary keys.

My understanding is the new function is meant to pull out the how the data is actually sorted versus a function that applies a sort. I'm keen on this - perhaps we could develop here and push to metatools if we think it doesn't belong.

The proposed functions don't use any metadata. Thus I think they shouldn't be in metatools.

Aug 20 '24 12:08 bundfussr

ah ok, thanks! my misunderstanding then. i had read it as we were trying to make our own sorting function.

Aug 20 '24 12:08 rossfarrugia

Feature Request: Add the possibility to keep initial sorting order to `restrict_derivation`

Feature Idea

Relevant Input

Relevant Output

Reproducible Example/Pseudo Code