Tplyr icon indicating copy to clipboard operation
Tplyr copied to clipboard

`apply_row_masks()` doesn't respect descending sorting of row break variables

Open mstackhouse opened this issue 3 years ago • 1 comments
trafficstars

Prerequisites

For more information, see the CONTRIBUTING guide.

Description

apply_row_masks() is intended to be used post-sorting, as it blanks row labels out assuming that the repetitive values are already next to each other. Currently, if using additional break variables to insert row breaks, this fails to consider if one of the break variables has been sorted in descending order, leading to an unexpected result.

Steps to Reproduce (Bug Report Only)

adae <- haven::read_xpt(url("https://github.com/phuse-org/TestDataFactory/raw/main/Updated/TDF_ADaM/adae.xpt")) %>% 
  filter(AEBODSYS %in% c("IMMUNE SYSTEM DISORDERS", "CONGENITAL, FAMILIAL AND GENETIC DISORDERS"))

# Create the Tplyr table object - this is like a specification for the table
t <- tplyr_table(adae, TRTA) %>% 
  add_layer(
    group_count(vars(AEBODSYS, AEDECOD)) %>% 
      set_distinct_by(USUBJID) %>% 
      set_order_count_method("bycount", break_ties='desc') %>%
      set_ordering_cols("Xanomeline High Dose") %>%
      set_result_order_var(distinct_n)
  )

# Now build the table - this is where the number crunching happens
ae1 <- t %>% 
  build() %>% 
  arrange(desc(ord_layer_1), desc(ord_layer_2)) %>% 
  apply_row_masks(row_breaks=TRUE, ord_layer_1)

Expected behavior: [What you expected to happen]

CONGENITAL, FAMILIAL AND GENETIC DISORDERS should sort first, and IMMUNE SYSTEM DISORDERS should sort second.

Actual behavior: [What actually happened]

IMMUNE SYSTEM DISORDERS sorts first and CONGENITAL, FAMILIAL AND GENETIC DISORDERS sorts second. Ascending sorting is reimplemented for ord_layer_1.

** Workaround **

The desired sort order can be achieved by re-sorting the data with the newly added ord_break variable.

ae1 %>% 
  arrange(desc(ord_layer_1), desc(ord_layer_2)) %>% 
  apply_row_masks(row_breaks = TRUE, ord_layer_index, ord_layer_1) %>% 
  arrange(desc(ord_layer_1), ord_break, desc(ord_layer_2))

This is far from desirable, but it works.

mstackhouse avatar May 16 '22 20:05 mstackhouse

One way to resolve this would be to capture desc(ord_layer_1) in the call to apply_row_masks() and use that when the row breaks are inserted:

apply_row_masks(row_breaks = TRUE, ord_layer_index, desc(ord_layer_1))

apply_row_masks() has to re-sort the data after the blanked out rows are inserted, as there are new rows injected into the data that need to be ordered properly. So if the ordering is provided, this could be respected.

If there's a stacking method that could respect variable grouping, this could be avoided - but I'm not seeing that in the tidyverse.

mstackhouse avatar May 16 '22 21:05 mstackhouse