PatientLevelPrediction icon indicating copy to clipboard operation
PatientLevelPrediction copied to clipboard

List of changes required to handle covariate data with 64 bit

Open lhjohn opened this issue 1 year ago • 1 comments

To get the latest arrow with 64 bit fields running we had to make some changes throughout the PLP package. I will keep track of all changes required here in case we want to implement them at one point:

  • applyTidyCovariateData: Needs the followign addition
covariateData$maxes <- covariateData$maxes %>%
        dplyr::mutate(covariateId=bit64::as.integer64(covariateId))
  • CovariateSummar: needs to convert integer64 to numeric:
  for (i in 1:length(covariateSummariesPerStrata)) {
    covariateSummariesPerStrata[[i]] <- covariateSummariesPerStrata[[i]] %>%
      dplyr::mutate(CovariateCount = as.integer(CovariateCount))
  }
  • FeatureExtraction::tidyCovariateData deleteCovariateIds <- c(deleteCovariateIds, toDelete$covariateId) should become deleteCovariateIds <- c(toDelete$covariateId, deleteCovariateIds)

  • ensure that options(arrow.int64_downcast = FALSE), also inside parallel loops

options("repos" = "https://cran.r-project.org"), then renv::restore(repos = getOption("repos"))

lhjohn avatar Apr 21 '23 18:04 lhjohn

Do we need 64 bit fields? The covariateIds are the only things that can be big enough as we convert subject_id into row_id. We could just map covariateIds into the covariate_row_id (similar to what is done when we convert to a sparse matrix) and then we have no requirement for 64 bit.

Interested in peoples thoughts.

jreps avatar Apr 25 '23 13:04 jreps