fmriprep icon indicating copy to clipboard operation
fmriprep copied to clipboard

t2smap OOM

Open bpinsard opened this issue 1 year ago • 4 comments

What happened?

Processing session with 6 runs of multi-echo, the jobs get killed by SLURM, despite setting memory reqs of SLURM with a larger buffer to memory given to fmriprep. tedana t2smap is the one crashing so nipype-set reqs seems not to properly estimate the memory reqs for the nodes.

I know that this problem has been reported before, but it seems to still be present in 23.1.4. Each echo file nii.gz is approximately .4Gb Current heuristics is mem_gb=2.5 * mem_gb * len(echo_times). So it estimates memory reqs at ~3Gb, but core dumps when OOM occurs are 8Gb, and that what basic top gives me as well.

I will try to run memory profiling of t2smap alone on our data to figure out a better heuristic.

What command did you use?

containers-run -m 'fMRIPrep_sub-01/ses-001' -n bids-fmriprep --input sourcedata/templateflow/tpl-MNI152NLin2009cAsym/ --input sourcedata/templateflow/tpl-OASIS30ANTs/ --input sourcedata/templateflow/tpl-fsLR/ --input sourcedata/templateflow/tpl-fsaverage/ --input sourcedata/templateflow/tpl-MNI152NLin6Asym/ --output . --input 'sourcedata/cneuromod.emotion-videos/sub-01/ses-001/fmap/' --input 'sourcedata/cneuromod.emotion-videos/sub-01/ses-001/func/' --input 'sourcedata/cneuromod.anat.smriprep.longitudinal/sub-01/anat/' --input sourcedata/cneuromod.anat.smriprep.longitudinal/sourcedata/cneuromod.anat.freesurfer_longitudinal/sub-01/ -- -w ./workdir --participant-label 01 --anat-derivatives sourcedata/cneuromod.anat.smriprep.longitudinal --fs-subjects-dir sourcedata/cneuromod.anat.smriprep.longitudinal/sourcedata/cneuromod.anat.freesurfer_longitudinal --bids-filter-file code/fmriprep_study-cneuromod.emotion-videos_sub-01_ses-001_bids_filters.json --output-layout bids --ignore slicetiming --use-syn-sdc --output-spaces MNI152NLin2009cAsym T1w:res-iso2mm --cifti-output 91k --notrack --write-graph --skip_bids_validation --omp-nthreads 8 --nprocs 8 --mem_mb 45000 --fs-license-file code/freesurfer.license \--me-output-echos --resource-monitor sourcedata/cneuromod.emotion-videos ./ participant

What version of fMRIPrep are you running?

23.1.4

How are you running fMRIPrep?

Singularity

Is your data BIDS valid?

Yes

Are you reusing any previously computed results?

Anatomical derivatives

Please copy and paste any relevant log output.

No response

Additional information / screenshots

No response

bpinsard avatar Nov 01 '23 15:11 bpinsard

Just linking relevant issues/PRs:

  • https://github.com/nipreps/fmriprep/issues/2728
  • https://github.com/nipreps/fmriprep/pull/2898

effigies avatar Nov 01 '23 15:11 effigies

I realized that I get that warning in the logs

231103-02:09:47,698 nipype.workflow WARNING:
         Some nodes exceed the total amount of memory available (45.00GB).

I cannot imagine which operation would require that amount of memory for 5min runs of 2mm iso fMRI. Looking for a way to get the node mem_gb values, apparently it's not in the exported graph (--write-graph).

Looking at the code, the likely case is that nodes set with mem_gb = mem_gb * 3 * omp_nthreads and maybe using resampled mem_gb estimate, could reach that much memory reqs because I used omp_nthreads=8.

However t2smap is likely not the largest mem_gb req set in that workflow.

bpinsard avatar Nov 03 '23 14:11 bpinsard

Here's what we calculate:

https://github.com/nipreps/fmriprep/blob/61a7d9835c6af84d586916537bff304cacfa5d3c/fmriprep/workflows/bold/base.py#L261-L269

https://github.com/nipreps/fmriprep/blob/61a7d9835c6af84d586916537bff304cacfa5d3c/fmriprep/workflows/bold/base.py#L1273-L1285

I don't really remember the logic for the largemem one. But you should be able to see the estimates in your logs.

effigies avatar Nov 03 '23 14:11 effigies

Let's go ahead and link https://github.com/ME-ICA/tedana/issues/856.

effigies avatar Nov 07 '23 15:11 effigies