Feature Request: Check the dataset miss variables that are defined in the variable metadata
Feature Idea
For below case, STUDYID in the ADSL had be deleted, the output xpt file will only contain the 50 variables but the metadata have 51.
It is better to throw a message that .df miss some varibles that are defined in the metadat.
e.g. STUDYID in the metadata but not found in the dataset
Relevant Input
No response
Relevant Output
No response
Reproducible Example/Pseudo Code
library(dplyr, warn.conflicts = FALSE)
library(xportr)
data("adsl_xportr")
ADSL <- adsl_xportr
spec_path <- system.file(file.path("specs", "ADaM_spec.xlsx"), package = "xportr")
var_spec <- readxl::read_xlsx(spec_path, sheet = "Variables") %>%
dplyr::rename(type = "Data Type") %>%
dplyr::rename_with(tolower)
dataset_spec <- readxl::read_xlsx(spec_path, sheet = "Datasets") %>%
dplyr::rename(label = "Description") %>%
dplyr::rename_with(tolower)
ADSL %>%
select(-STUDYID) %>%
xportr_metadata(var_spec, "ADSL", verbose = "warn") %>%
xportr_type() %>%
xportr_length() %>%
xportr_label() %>%
xportr_order() %>%
xportr_format() %>%
xportr_df_label(dataset_spec) %>%
xportr_write("adsl.xpt")
#>
#> ── All variables in dataset are found in `metadata` ──
#>
#> ── 50 reordered in dataset ──
#>
#> Warning: Variable reordered in `.df`: `SITEID`, `USUBJID`, `SUBJID`, `COUNTRY`,
#> `AGE`, `AGEU`, `AGEGR1`, `SEX`, `RACE`, `RACEGR1`, `ETHNIC`, `RFSTDTC`,
#> `RFENDTC`, `RFXSTDTC`, `RFXENDTC`, `RFICDTC`, `RFPENDTC`, `DMDTC`, `DMDY`,
#> `SAFFL`, `ARM`, `ARMCD`, `ACTARM`, `ACTARMCD`, `TRT01P`, `TRT01A`, `TRTSDTM`,
#> `TRTSTMF`, `TRTEDTM`, `TRTETMF`, `TRTSDT`, `TRTEDT`, `DTHFL`, `DTHDTC`,
#> `DTHDT`, `DTHDTF`, `DTHADY`, `REGION1`, `TRTDURD`, `LDDTHELD`, `LSTALVDT`,
#> `LDDTHGR1`, `DTH30FL`, `DTHA30FL`, `DTHB30FL`, `FRVDT`, `RANDDT`, `SCRFDT`,
#> `EOSDT`, and `EOSSTT`
#> Warning: (xportr::xportr_format) `LSTALVDT` is expected to have a format but
#> does not.
nrow(var_spec)
#> [1] 51
ncol(haven::read_xpt("adsl.xpt"))
#> [1] 50
Created on 2025-03-06 with reprex v2.1.1
@bms63 Should this be something that gets implemented as a check in xportr_write() so something like screenshot 1, then call it at the beginning of xportr_write like in screentshot 2
@schmetti-kim this would be helpful check to alert users that something is missing in their dataset.
I am wondering if the code that is checking for all variables in the metadata are found in the data could be adapted to alert users if a variable is missing from the dataset.
This would have to implemented for every xportr function.
@schmetti-kim this would be helpful check to alert users that something is missing in their dataset.
I am wondering if the code that is checking for all variables in the metadata are found in the data could be adapted to alert users if a variable is missing from the dataset.
This would have to implemented for every xportr function.
This seems like a very helpful feature to implement. I will take a closer look at both issues within the next 48 hours. :)