xportr icon indicating copy to clipboard operation
xportr copied to clipboard

Closes #270 closes #297 warning the user of grouped data

Open sadchla-codes opened this issue 5 months ago • 7 comments

Thank you for your Pull Request!

We have developed a Pull Request template to aid you and our reviewers. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the xportr codebase remains robust and consistent.

The scope of {xportr}

{xportr}'s scope is to enable R users to write out submission compliant xpt files that can be delivered to a Health Authority or to downstream validation software programs. We see labels, lengths, types, ordering and formats from a dataset specification object (SDTM and ADaM) as being our primary focus. We also see messaging and warnings to users around applying information from the specification file as a primary focus. Please make sure your Pull Request meets this scope of {xportr}. If your Pull Request moves beyond this scope, please get in touch with the {xportr} team on slack or create an issue to discuss.

Please check off each task box as an acknowledgment that you completed the task. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the main branch until you have checked off each task.

Changes Description

(descriptions of changes)

Task List

  • [x] The spirit of xportr is met in your Pull Request
  • [x] Place Closes #<insert_issue_number> into the beginning of your Pull Request Title (Use Edit button in top-right if you need to update)
  • [x] Summary of changes filled out in the above Changes Description. Can be removed or left blank if changes are minor/self-explanatory.
  • [x] Code is formatted according to the tidyverse style guide. Use styler package and functions to style files accordingly.
  • [x] New functions or arguments follow established convention found in the Wiki.
  • [x] Updated relevant unit tests or have written new unit tests. See our Wiki for conventions used in this package.
  • [x] Creation/updated relevant roxygen headers and examples. See our Wiki for conventions used in this package.
  • [x] Run devtools::document() so all .Rd files in the man folder and the NAMESPACE file in the project root are updated appropriately
  • [x] Run pkgdown::build_site() and check that all affected examples are displayed correctly and that all new/updated functions occur on the "Reference" page.
  • [x] Update NEWS.md if the changes pertain to a user-facing function (i.e. it has an @export tag) or documentation aimed at users (rather than developers)
  • [x] The NEWS.md entry should go under the # xportr development version section. Don't worry about updating the version because it will be auto-updated using the vbump.yaml CI.
  • [x] Address any updates needed for vignettes and/or templates.
  • [x] Link the issue Development Panel so that it closes after successful merging.
  • [x] The developer is responsible for fixing merge conflicts not the Reviewer.
  • [x] Pat yourself on the back for a job well done! Much love to your accomplishment!

sadchla-codes avatar Nov 22 '25 05:11 sadchla-codes

Code Coverage

Package Line Rate Health
xportr 99%
Summary 99% (903 / 908)

github-actions[bot] avatar Nov 22 '25 05:11 github-actions[bot]

HI @bms63 The function group_data_check checks whether the data is group and the type of return will depends on the verbose. However, regardless of the verbose, the data still gets ungrouped. image

sadchla-codes avatar Nov 22 '25 05:11 sadchla-codes

I think xportr_write() needs to be looked at more

image

Wondering if we can improve this error message as im clueless. When the data isn't group then it fires fine.

From pharmaverse example site using the safety_specs.xlsx fround in the metadata folder

----r setup, message=FALSE, warning=FALSE, results='hold'--------------------

library(metacore) library(metatools) library(pharmaversesdtm) library(admiral) library(xportr) library(dplyr) library(tidyr) library(lubridate) library(stringr)

Read in input SDTM data

dm <- pharmaversesdtm::dm ds <- pharmaversesdtm::ds ex <- pharmaversesdtm::ex ae <- pharmaversesdtm::ae vs <- pharmaversesdtm::vs suppdm <- pharmaversesdtm::suppdm

When SAS datasets are imported into R using haven::read_sas(), missing

character values from SAS appear as "" characters in R, instead of appearing

as NA values. Further details can be obtained via the following link:

https://pharmaverse.github.io/admiral/articles/admiral.html#handling-of-missing-values

dm <- convert_blanks_to_na(dm) ds <- convert_blanks_to_na(ds) ex <- convert_blanks_to_na(ex) ae <- convert_blanks_to_na(ae) vs <- convert_blanks_to_na(vs) suppdm <- convert_blanks_to_na(suppdm)

----r combine, message=FALSE, warning=FALSE, results='hold'------------------

Combine Parent and Supp - very handy! ----

dm_suppdm <- combine_supp(dm, suppdm)

----r metacore, warning=FALSE, results='hold'--------------------------------

Read in metacore object

metacore <- spec_to_metacore( path = "safety_specs.xlsx",

All datasets are described in the same sheet

where_sep_sheet = FALSE ) %>% select_dataset("ADSL")

----r demographics-----------------------------------------------------------

adsl_preds <- build_from_derived(metacore, ds_list = list("dm" = dm_suppdm, "suppdm" = dm_suppdm), predecessor_only = FALSE, keep = FALSE )

----r grouping_option_1------------------------------------------------------

agegr1_lookup <- exprs( ~condition, ~AGEGR1, ~AGEGR1N, is.na(AGE), "Missing", 4, AGE < 18, "<18", 1, between(AGE, 18, 64), "18-64", 2, !is.na(AGE), ">64", 3 )

adsl_cat <- derive_vars_cat( dataset = adsl_preds, definition = agegr1_lookup )

----r------------------------------------------------------------------------

get_control_term(metacore, variable = AGEGR1)

----r grouping_option_2------------------------------------------------------

adsl_ct <- adsl_preds %>% create_cat_var(metacore, ref_var = AGE, grp_var = AGEGR1, num_grp_var = AGEGR1N )

----r grouping_option_3------------------------------------------------------

format_agegr1 <- function(age) { case_when( age < 18 ~ "<18", between(age, 18, 64) ~ "18-64", age > 64 ~ ">64", TRUE ~ "Missing" ) }

format_agegr1n <- function(age) { case_when( age < 18 ~ 1, between(age, 18, 64) ~ 2, age > 64 ~ 3, TRUE ~ 4 ) }

adsl_cust <- adsl_preds %>% mutate( AGEGR1 = format_agegr1(AGE), AGEGR1N = format_agegr1n(AGE) )

----r codelist---------------------------------------------------------------

adsl_ct <- adsl_ct %>% create_var_from_codelist( metacore = metacore, input_var = RACE, out_var = RACEN )

----r exposure---------------------------------------------------------------

ex_ext <- ex %>% derive_vars_dtm( dtc = EXSTDTC, new_vars_prefix = "EXST" ) %>% derive_vars_dtm( dtc = EXENDTC, new_vars_prefix = "EXEN", time_imputation = "last" )

adsl_raw <- adsl_ct %>%

Treatment Start Datetime

derive_vars_merged( dataset_add = ex_ext, filter_add = (EXDOSE > 0 | (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO"))) & !is.na(EXSTDTM), new_vars = exprs(TRTSDTM = EXSTDTM, TRTSTMF = EXSTTMF), order = exprs(EXSTDTM, EXSEQ), mode = "first", by_vars = exprs(STUDYID, USUBJID) ) %>%

Treatment End Datetime

derive_vars_merged( dataset_add = ex_ext, filter_add = (EXDOSE > 0 | (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO"))) & !is.na(EXENDTM), new_vars = exprs(TRTEDTM = EXENDTM, TRTETMF = EXENTMF), order = exprs(EXENDTM, EXSEQ), mode = "last", by_vars = exprs(STUDYID, USUBJID) ) %>%

Treatment Start and End Date

derive_vars_dtm_to_dt(source_vars = exprs(TRTSDTM, TRTEDTM)) %>% # Convert Datetime variables to date

Treatment Start Time

derive_vars_dtm_to_tm(source_vars = exprs(TRTSDTM)) %>%

Treatment Duration

derive_var_trtdurd() %>%

Safety Population Flag

derive_var_merged_exist_flag( dataset_add = ex, by_vars = exprs(STUDYID, USUBJID), new_var = SAFFL, false_value = "N", missing_value = "N", condition = (EXDOSE > 0 | (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO"))) )

----r treatment_char, eval=TRUE----------------------------------------------

adsl <- adsl_raw %>% mutate( TRT01P = if_else(ARM %in% c("Screen Failure", "Not Assigned", "Not Treated"), "No Treatment", ARM), TRT01A = if_else(ACTARM %in% c("Screen Failure", "Not Assigned", "Not Treated"), "No Treatment", ACTARM) )

----r treatment_num, eval=TRUE-----------------------------------------------

adsl <- adsl %>% create_var_from_codelist(metacore, input_var = TRT01P, out_var = TRT01PN) %>% create_var_from_codelist(metacore, input_var = TRT01A, out_var = TRT01AN)

----r disposition, eval=TRUE-------------------------------------------------

Convert character date to numeric date without imputation

ds_ext <- derive_vars_dt( ds, dtc = DSSTDTC, new_vars_prefix = "DSST" )

adsl <- adsl %>% derive_vars_merged( dataset_add = ds_ext, by_vars = exprs(STUDYID, USUBJID), new_vars = exprs(EOSDT = DSSTDT), filter_add = DSCAT == "DISPOSITION EVENT" & DSDECOD != "SCREEN FAILURE" )

----r eval=TRUE--------------------------------------------------------------

format_eosstt <- function(x) { case_when( x %in% c("COMPLETED") ~ "COMPLETED", x %in% c("SCREEN FAILURE") ~ NA_character_, TRUE ~ "DISCONTINUED" ) }

----r eval=TRUE--------------------------------------------------------------

adsl <- adsl %>% derive_vars_merged( dataset_add = ds, by_vars = exprs(STUDYID, USUBJID), filter_add = DSCAT == "DISPOSITION EVENT", new_vars = exprs(EOSSTT = format_eosstt(DSDECOD)), missing_values = exprs(EOSSTT = "ONGOING") )

----r eval=TRUE--------------------------------------------------------------

adsl <- adsl %>% derive_vars_dt( new_vars_prefix = "DTH", dtc = DTHDTC, highest_imputation = "M", date_imputation = "first" )

----r eval=TRUE--------------------------------------------------------------

adsl <- adsl %>% derive_vars_merged( dataset_add = ds_ext, by_vars = exprs(STUDYID, USUBJID), new_vars = exprs(RANDDT = DSSTDT), filter_add = DSDECOD == "RANDOMIZED", ) %>% derive_vars_merged( dataset_add = ds_ext, by_vars = exprs(STUDYID, USUBJID), new_vars = exprs(SCRFDT = DSSTDT), filter_add = DSCAT == "DISPOSITION EVENT" & DSDECOD == "SCREEN FAILURE" ) %>% derive_vars_merged( dataset_add = ds_ext, by_vars = exprs(STUDYID, USUBJID), new_vars = exprs(FRVDT = DSSTDT), filter_add = DSCAT == "OTHER EVENT" & DSDECOD == "FINAL RETRIEVAL VISIT" )

----r eval=TRUE--------------------------------------------------------------

adsl <- adsl %>% derive_vars_duration( new_var = DTHADY, start_date = TRTSDT, end_date = DTHDT ) %>% derive_vars_duration( new_var = LDDTHELD, start_date = TRTEDT, end_date = DTHDT, add_one = FALSE )

----r eval=TRUE--------------------------------------------------------------

assign_randfl <- function(x) { if_else(!is.na(x), "Y", NA_character_) }

adsl <- adsl %>% mutate( RANDFL = assign_randfl(RANDDT) )

----r death, eval=TRUE-------------------------------------------------------

adsl <- adsl %>% derive_vars_extreme_event( by_vars = exprs(STUDYID, USUBJID), events = list( event( dataset_name = "ae", condition = AEOUT == "FATAL", set_values_to = exprs(DTHCAUS = AEDECOD, DTHDOM = "AE"), ), event( dataset_name = "ds", condition = DSDECOD == "DEATH" & grepl("DEATH DUE TO", DSTERM), set_values_to = exprs(DTHCAUS = DSTERM, DTHDOM = "DS"), ) ), source_datasets = list(ae = ae, ds = ds), tmp_event_nr_var = event_nr, order = exprs(event_nr), mode = "first", new_vars = exprs(DTHCAUS, DTHDOM) )

----r grouping, eval=TRUE----------------------------------------------------

region1_lookup <- exprs( ~condition, ~REGION1, ~REGION1N, COUNTRY %in% c("CAN", "USA"), "North America", 1, !is.na(COUNTRY), "Rest of the World", 2, is.na(COUNTRY), "Missing", 3 )

racegr1_lookup <- exprs( ~condition, ~RACEGR1, ~RACEGR1N, RACE %in% c("WHITE"), "White", 1, RACE != "WHITE", "Non-white", 2, is.na(RACE), "Missing", 3 )

dthcgr1_lookup <- exprs( ~condition, ~DTHCGR1, ~DTHCGR1N, DTHDOM == "AE", "ADVERSE EVENT", 1, !is.na(DTHDOM) & str_detect(DTHCAUS, "(PROGRESSIVE DISEASE|DISEASE RELAPSE)"), "PROGRESSIVE DISEASE", 2, !is.na(DTHDOM) & !is.na(DTHCAUS), "OTHER", 3, is.na(DTHDOM), NA_character_, NA )

adsl <- adsl %>% derive_vars_cat( definition = region1_lookup ) %>% derive_vars_cat( definition = racegr1_lookup ) %>% derive_vars_cat( definition = dthcgr1_lookup )

----r checks, warning=FALSE, message=FALSE-----------------------------------

dir <- tempdir() # Specify the directory for saving the XPT file

adsl_grp <- adsl %>% group_by(USUBJID, TRTSDT, TRT01P)

adsl_grp %>% check_variables(metacore) %>% # Check all variables specified are present and no more check_ct_data(metacore, na_acceptable = TRUE) %>% # Checks all variables with CT only contain values within the CT order_cols(metacore) %>% # Orders the columns according to the spec sort_by_key(metacore) %>% # Sorts the rows by the sort keys #xportr_type(metacore, domain = "ADSL") %>% # Coerce variable type to match spec #xportr_length(metacore) %>% # Assigns SAS length from a variable level metadata #xportr_label(metacore) %>% # Assigns variable label from metacore specifications #xportr_df_label(metacore) %>% # Assigns dataset label from metacore specifications xportr_write(file.path(dir, "adsl.xpt"), metadata = metacore, domain = "ADSL")

bms63 avatar Dec 02 '25 21:12 bms63

when i run the xportr_type alone the grouping warning is detected, but it run it xportr_length() I no longer receive the grouping warning

image

bms63 avatar Dec 02 '25 21:12 bms63

@bms63 It should work for xportr_legnth() now

when i run the xportr_type alone the grouping warning is detected, but it run it xportr_length() I no longer receive the grouping warning

image

sadchla-codes avatar Dec 10 '25 19:12 sadchla-codes

@copilot can you create an additional test that tests how xportr_write() works with grouped data. Please branch off this feature branch and create a new Pull Request. Please follow conventions within the test-write.R file, e.g. follow pattern of data creation and naming of the test that subject.

bms63 avatar Dec 21 '25 18:12 bms63

@bms63 I've opened a new pull request, #321, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot avatar Dec 21 '25 18:12 Copilot

Thanks again @sadchla-codes going to merge this in.

I think in the next round we should pretty up the message with cli and glue :)

bms63 avatar Jan 06 '26 22:01 bms63