RMINC icon indicating copy to clipboard operation
RMINC copied to clipboard

Possible memory leak in mincWriteVolume

Open gdevenyi opened this issue 3 years ago • 9 comments

We're running the following code on Niagara:

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
# Load packages (not all are necessary)
library(lme4)
library(lmerTest)
library(RMINC)
library(tidyverse)


# set working directory 
setwd("/scratch/m/mchakrav/paulbest/genfi_df5/dbm/dbm_4/linear_model")

#load data 

data <- read_csv("pls_nona_nia.csv")

model1<-mincLmer(jacobians ~  time_month + (time_month|Id),
data = data,
mask="secondlevel_otsumask.mnc",
summary_type = "ranef",
parallel=c("local", 20),
control=lmerControl(optimizer ="Nelder_Mead"))

save(model1, file = "output_lmer/model1.RData")
save.image(file = "complete_model.RData")

unique_id=unique(data['Id'])
for(i in 1:nrow(unique_id))
{         
  id=unique_id[i,1] 
  column=paste0("beta-time_month-Id", id)
  dir.create(file.path(paste0("output_lmer/",id,"/")), showWarnings = FALSE)
  output_minc_file<-paste0("output_lmer/",id,"/",id,"_time_month_Id_beta.mnc")
  mincWriteVolume(model1,output.filename=output_minc_file,like.filename="secondlevel_template0.mnc",column = column)
  }          

And seeing the following memory performance: Screen Shot 2022-01-19 at 6 18 06 PM

During this time, we loop through and get ~29 files saved, before the system runs out of memory and kills R.

we're not creating any new memory-holding objects as far as I understand, but R memory consumption rises and eventually fills the node.

gdevenyi avatar Jan 25 '22 19:01 gdevenyi

Is VOLUME_CACHE_THRESHOLD set ?

bcdarwin avatar Jan 25 '22 19:01 bcdarwin

The minc-toolkit default environment sets it as:

VOLUME_CACHE_THRESHOLD=-1

gdevenyi avatar Jan 25 '22 20:01 gdevenyi

Hmm, quite possible it is a memory leak in the C bindings or similar.

Is this memory measurement the whole machine or your R process? If the latter, do you know if the dip at 3.30pm corresponds to the beginning of writing files?

Do I have access to your modules/files on Niagara? If so I could try to debug by running under Valgrind but I'm pretty unfamiliar with RMINC internals so might be challenging.

bcdarwin avatar Jan 25 '22 20:01 bcdarwin

Is this memory measurement the whole machine or your R process?

This is the Niagara readout from the slurm whole-machine statistics

If the latter, do you know if the dip at 3.30pm corresponds to the beginning of writing files? That's a really good question, that dip is pretty big and all the allocations should be done at that point. I'm not sure.

Do I have access to your modules/files on Niagara?

Yes

export QUARANTINE_PATH=/project/m/mchakrav/quarantine
module use ${QUARANTINE_PATH}/modules
module load cobralab

For now, we're addressing this by randomizing the list of files to write out and repeating the job so we'll eventually get them all.

gdevenyi avatar Jan 25 '22 20:01 gdevenyi

Just as a quick note -- as a better workaround than randomizing, you could probably run mincWriteVolume from short-lived subprocesses e.g. using batchMap from batchtools with local multiprocessing backend.

bcdarwin avatar Jan 25 '22 20:01 bcdarwin

Do I have access to your modules/files on Niagara?

Yes

export QUARANTINE_PATH=/project/m/mchakrav/quarantine
module use ${QUARANTINE_PATH}/modules
module load cobralab

For now, we're addressing this by randomizing the list of files to write out and repeating the job so we'll eventually get them all.

Thanks. Is there any chance you could give me read access (via extended ACLs, say) to the data directory as well?

bcdarwin avatar Jan 25 '22 21:01 bcdarwin

I have a special share for that,

/scratch/m/mchakrav/share

You have read-write there. Data is still copying. I suggest ~30 minutes wait.

gdevenyi avatar Jan 25 '22 21:01 gdevenyi

Are the jacobians being copied as well ?

bcdarwin avatar Jan 25 '22 22:01 bcdarwin

Are the jacobians being copied as well ?

Yes. Warning: the file paths are all absolute path. Student will be talked to.

gdevenyi avatar Jan 25 '22 23:01 gdevenyi