xportr icon indicating copy to clipboard operation
xportr copied to clipboard

Feature Request: Check the dataset miss variables that are defined in the variable metadata

Open ynsec37 opened this issue 1 year ago • 3 comments

Feature Idea

For below case, STUDYID in the ADSL had be deleted, the output xpt file will only contain the 50 variables but the metadata have 51.

It is better to throw a message that .df miss some varibles that are defined in the metadat.

e.g. STUDYID in the metadata but not found in the dataset

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

library(dplyr, warn.conflicts = FALSE)
library(xportr)

data("adsl_xportr")
ADSL <- adsl_xportr

spec_path <- system.file(file.path("specs", "ADaM_spec.xlsx"), package = "xportr")
var_spec <- readxl::read_xlsx(spec_path, sheet = "Variables") %>%
  dplyr::rename(type = "Data Type") %>%
  dplyr::rename_with(tolower)
dataset_spec <- readxl::read_xlsx(spec_path, sheet = "Datasets") %>%
  dplyr::rename(label = "Description") %>%
  dplyr::rename_with(tolower)

ADSL %>% 
  select(-STUDYID) %>%
  xportr_metadata(var_spec, "ADSL", verbose = "warn") %>%
  xportr_type() %>%
  xportr_length() %>%
  xportr_label() %>%
  xportr_order() %>%
  xportr_format() %>%
  xportr_df_label(dataset_spec) %>%
  xportr_write("adsl.xpt")
#> 
#> ── All variables in dataset are found in `metadata` ──
#> 
#> ── 50 reordered in dataset ──
#> 
#> Warning: Variable reordered in `.df`: `SITEID`, `USUBJID`, `SUBJID`, `COUNTRY`,
#> `AGE`, `AGEU`, `AGEGR1`, `SEX`, `RACE`, `RACEGR1`, `ETHNIC`, `RFSTDTC`,
#> `RFENDTC`, `RFXSTDTC`, `RFXENDTC`, `RFICDTC`, `RFPENDTC`, `DMDTC`, `DMDY`,
#> `SAFFL`, `ARM`, `ARMCD`, `ACTARM`, `ACTARMCD`, `TRT01P`, `TRT01A`, `TRTSDTM`,
#> `TRTSTMF`, `TRTEDTM`, `TRTETMF`, `TRTSDT`, `TRTEDT`, `DTHFL`, `DTHDTC`,
#> `DTHDT`, `DTHDTF`, `DTHADY`, `REGION1`, `TRTDURD`, `LDDTHELD`, `LSTALVDT`,
#> `LDDTHGR1`, `DTH30FL`, `DTHA30FL`, `DTHB30FL`, `FRVDT`, `RANDDT`, `SCRFDT`,
#> `EOSDT`, and `EOSSTT`
#> Warning: (xportr::xportr_format) `LSTALVDT` is expected to have a format but
#> does not.


nrow(var_spec)
#> [1] 51

ncol(haven::read_xpt("adsl.xpt"))
#> [1] 50

Created on 2025-03-06 with reprex v2.1.1

ynsec37 avatar Mar 05 '25 16:03 ynsec37

@bms63 Should this be something that gets implemented as a check in xportr_write() so something like screenshot 1, then call it at the beginning of xportr_write like in screentshot 2

Image Image

sadchla-codes avatar Nov 22 '25 06:11 sadchla-codes

@schmetti-kim this would be helpful check to alert users that something is missing in their dataset.

I am wondering if the code that is checking for all variables in the metadata are found in the data could be adapted to alert users if a variable is missing from the dataset.

This would have to implemented for every xportr function.

bms63 avatar Dec 20 '25 20:12 bms63

@schmetti-kim this would be helpful check to alert users that something is missing in their dataset.

I am wondering if the code that is checking for all variables in the metadata are found in the data could be adapted to alert users if a variable is missing from the dataset.

This would have to implemented for every xportr function.

This seems like a very helpful feature to implement. I will take a closer look at both issues within the next 48 hours. :)

schmetti-kim avatar Dec 20 '25 20:12 schmetti-kim