PatientLevelPrediction
PatientLevelPrediction copied to clipboard
List of changes required to handle covariate data with 64 bit
To get the latest arrow with 64 bit fields running we had to make some changes throughout the PLP package. I will keep track of all changes required here in case we want to implement them at one point:
-
applyTidyCovariateData
: Needs the followign addition
covariateData$maxes <- covariateData$maxes %>%
dplyr::mutate(covariateId=bit64::as.integer64(covariateId))
- CovariateSummar: needs to convert integer64 to numeric:
for (i in 1:length(covariateSummariesPerStrata)) {
covariateSummariesPerStrata[[i]] <- covariateSummariesPerStrata[[i]] %>%
dplyr::mutate(CovariateCount = as.integer(CovariateCount))
}
-
FeatureExtraction::tidyCovariateData
deleteCovariateIds <- c(deleteCovariateIds, toDelete$covariateId)
should becomedeleteCovariateIds <- c(toDelete$covariateId, deleteCovariateIds)
-
ensure that
options(arrow.int64_downcast = FALSE)
, also inside parallel loops
options("repos" = "https://cran.r-project.org"), then renv::restore(repos = getOption("repos"))
Do we need 64 bit fields? The covariateIds are the only things that can be big enough as we convert subject_id into row_id. We could just map covariateIds into the covariate_row_id (similar to what is done when we convert to a sparse matrix) and then we have no requirement for 64 bit.
Interested in peoples thoughts.