admiral icon indicating copy to clipboard operation
admiral copied to clipboard

Documentation: create_single_dose_dataset() example creates duplicates

Open tuiaj opened this issue 1 month ago • 9 comments

Please select a category the issue is focused on?

Other

Let us know where something needs a refresh or put your idea here!

Was trying out the ADPPK Template but found that the create_single_dose_dataset() under the "Expand Dosing Records" section was creating duplicates within the variable NFRLT.

ex_exp |> # dataset derived from create_single_dose_dataset in the above linked example
  filter(USUBJID == "01-701-1028")  |> # filters for the first subject as an example
  count(NFRLT, sort = T)  #shows  nominal times which are repeated

which gives: image

It seems the duplicates happen for the day of individual visits. Like during the process of unnesting each visit to a dose day/unit, it calculates the visit day twice rather than moving on.

image

Unsure if this is an actual coding/workflow issue or just specific to the example. In the previous step, "Get Dosing Information" - it creates the original NFRLT variable by NFRLT = 24 * (VISITDY - 1). This NFRLT is then used in the create_single_dose_dataset() to create the nominal time where the duplicates occur. In the other examples, I couldn't find this occuring.

However, when I reformatted the ex data before the create_single_dose_dataset() to have the first visit VISITDY be equal to 0 without altering any of the other visits, it allowed the logic of the function to proceed as intended.

reformatted <- ex_dates |> 
  filter(USUBJID == "01-701-1028")  |> #sets subject as an example
   mutate(VISITDY = if_else(VISITDY == 1, 0, VISITDY),
          NFRLT =VISITDY * 24) 

#shows the duplicates have decreased
reformatted |> 
  create_single_dose_dataset(dose_freq = EXDOSFRQ,
    start_date = ASTDT,
    start_datetime = ASTDTM,
    end_date = AENDT,
    end_datetime = AENDTM,
    nominal_time = NFRLT,
    lookup_table = dose_freq_lookup,
    lookup_column = CDISC_VALUE,
    keep_source_vars = exprs(
      STUDYID, USUBJID, EVID, EXDOSFRQ, EXDOSFRM,
      NFRLT, EXDOSE, EXDOSU, EXTRT, ASTDT, ASTDTM, AENDT, AENDTM,
      VISIT, VISITNUM, VISITDY,
      TRT01A, TRT01P, DOMAIN, EXSEQ, !!!adsl_vars)) |> 
  count(NFRLT, sort = T)

There is still duplicates in this reformatted example due to the misalignment of the VISITDY and the actual days between visits. I have the derived days between the first visit 2013-07-19 and last visit 2014-01-07 as 172 days ( 24.57 weeks) not the 168 listed. See attached excel for my visual breakdown: example_doseSchedule.xlsx

Overall, unsure how specific this problem is to the example data set (or my handling of it), or if it is something else. I am happy to help out wherever I can, but I am very new (this is my first issue on Github!)

tuiaj avatar May 06 '24 20:05 tuiaj