Issue preprocessing multiple multi echo / multi band data with fmriprep update
What happened?
We updated fMRIPrep to version 21.0.0. We have been using fMRIPrep for a few years now, and we have preprocessed multiple datasets with this pipeline. After making the upgrade, we wanted to re preprocess 2 of the datasets. We were able to re-preprocess the first no problem (NOT multi echo data). We went to preprocess the 2nd dataset - multiband/multiecho - and we can't preprocess more than 1 participant at a time. As I mentioned, these data have been preprocessed with fMRIPrep in the past no problem; I could run 10 at a time. (All work directories have been deleted, 'old' data have been moved out of the fMRIPrep path.) I am testing with 2-3 participants at a time, and have tried the following flags: --low-mem --mem-mb55000 --use-plugin plugin.yml (made a corresponding file - plugin: LegacyMultiProc plugin_args: {maxtasksperchild: 1, memory_gb: 60, n_procs: 10, raise_insufficient: false}
Without the flags, and with the '--low-mem' flag, I get a 'broken process pull' and fMRIPrep doesn't finish. With the other 2 flags, I get a Node Name error: fmriprep_wf.single_subject_3156_wf.func_preproc_ses_1_task.... _ echo_1_wf.bold_t2smap_wf.t2nmap_node (full error below) I get this node error for each participant, different number of nodes. If I run the participants separately, they run no problem.
What command did you use?
singularity run --cleanenv -B /data/MNA/data /data/MNA/fmriprep_21.0.0.simg /data/MNA/data/Nifti /data/MNA/data/Preprocessed participant --participant_label sub-3158 3159 3162 --ignore slicetiming sbref --fs-license-file /home/nicoler/license.txt
('original' command used. I also tried the various flags I outlined above)
What version of fMRIPrep are you running?
21.0.0
How are you running fMRIPrep?
Singularity
Is your data BIDS valid?
Yes
Are you reusing any previously computed results?
No
Please copy and paste any relevant log output.
When fMRIPrep doesn't finish/just hangs error is as follows:
ile "/opt/conda/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/opt/conda/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
When it runs, example of the node error:
File: /data/MNA/data/Preprocessed/sub-3158/log/20220301-105737_4b639652-12b4-452d-9125-1d5c84795b5b/crash-20220301-135236-nicoler-t2smap_node-473aa1f0-7183-40bb-b5f9-29a4a3e11102.txt
Working Directory: /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/bold_t2smap_wf/t2smap_node
Inputs:
args:
echo_times: [0.0132, 0.03119, 0.04918]
environ: {}
fittype: curvefit
in_files:
mask_file:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/plugins/legacymultiproc.py", line 67, in run_node
result["result"] = node.run(updatehash=updatehash)
File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 516, in run
result = self._run_interface(execute=True)
File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 635, in _run_interface
return self._run_command(execute)
File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 741, in _run_command
result = self._interface.run(cwd=outdir)
File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 428, in run
runtime = self._run_interface(runtime)
File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 822, in _run_interface
self.raise_exception(runtime)
File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 749, in raise_exception
raise RuntimeError(
RuntimeError: Command:
t2smap -d /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_0/merge/vol0000_unwarped_merged.nii.gz /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_1/merge/vol0000_unwarped_merged.nii.gz /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_2/merge/vol0000_unwarped_merged.nii.gz -e 13.2 31.189999999999998 49.18 --mask /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/brainextraction_wf/_echoidx_0/masker/clipped_mask.nii.gz --fittype curvefit
Standard output:
Standard error:
INFO t2smap:t2smap_workflow:229 Using output directory: /data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/bold_t2smap_wf/t2smap_node
INFO t2smap:t2smap_workflow:239 Loading input data: ['/data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_0/merge/vol0000_unwarped_merged.nii.gz', '/data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_1/merge/vol0000_unwarped_merged.nii.gz', '/data/MNA/data/work/fmriprep_wf/single_subject_3158_wf/func_preproc_ses_1_task_sop_run_001_echo_1_wf/unwarp_wf/_echoidx_2/merge/vol0000_unwarped_merged.nii.gz']
INFO t2smap:t2smap_workflow:255 Using user-defined mask
INFO t2smap:t2smap_workflow:258 Computing adaptive T2* map
Killed
Return code: 137
Additional information / screenshots
No response
This could be an issue from using a more recent version of tedana. @tsalo could I bug you to take a look?
It may also be that we need unwarp_wf to output uncompressed NIfTI files when --low-mem is passed. It's possible that in previous versions the inputs to t2smap were uncompressed.
It looks like tedana's memory usage is the problem here. tedana should be able to handle compressed or uncompressed NIfTI files equally well, since it just loads them into numpy arrays with nibabel. tedana does have a --lowmem option that could be passed from fMRIPrep when --low-mem is used, although I don't think it would have an effect on the problematic step here.
Two options I could see are:
- Within tedana, we could see if we can improve the memory usage of the
curvefitoption when--lowmemis used.. - Within fMRIPrep, set
--fittypetologlininstead ofcurvefitwhen--low-memis used. The log-linear T2* estimation method should be less accurate, but faster and less memory intensive.
Is there anything I can do to help test, next steps, etc.?
Thanks so much for taking a look at this issue - obviously we don't want to have to preprocess an entire dataset one participant at a time!
@njr175 I've opened an issue in the tedana repo (https://github.com/ME-ICA/tedana/issues/856), but it will probably be a while before we can propagate any changes to an fMRIPrep release, since we'll need to (1) fix the problem on tedana's side, (2) make a new release, (3) pin the new version in fMRIPrep's requirements, and (4) make a new fMRIPrep release.
If you're able to build the Docker image from a fork, you could manually switch from nonlinear T2* estimation to the log-linear version. Basically, you would just need to switch the order here:
https://github.com/nipreps/fmriprep/blob/32f7b06f148fc6020c5ecedb77fd8dee10a8813f/fmriprep/interfaces/multiecho.py#L64-L73
Instead of 'curvefit', 'loglin', which makes curvefit (the nonlinear approach) the default, you would have 'loglin', 'curvefit' so that the log-linear method is the default.
Thanks for offering a solution while we wait for the fix, @tsalo. Do I need to include one of the flags I was using before ( --low-mem --mem-mb55000 --use-plugin plugin.yml )
I switched the order as you indicated, and tried to re-run fMRIPrep (without any of the above flags), and I got the broken process pool error again. I was testing with 3 individuals.
ERROR File "/opt/conda/lib/python3.8/concurrent/futures/_base.py", line 389, in getresult rasie self.exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was termindated abruptly while the future was running or pending
Just wanted to send an update: We still have not been able to get the new version of fMRIPrep to work on more than 1 participant with multi echo data. With the fix you mentioned, I tried with and without the 3 flags - still doesn't work. To double check something didn't happen with our ubuntu server, I ran a handful of participants through an old version of fMRIPrep and they ran without error.
I'm sorry for the delay in responding. If you look at the logs, do they show the t2smap command used?
Yesterday I tried fmriprep 22.0.0rc3 on one subject (one session) and got the following error. It seems to me it's relevant but if it's not, please let me know and I delete my comment.
Error message:
Node: fmriprep_22_0_wf.single_subject_s003_wf.func_preproc_ses_3_task_vw_acq_mb4mesl56_run_2_echo_1_wf.bold_t2smap_wf.t2smap_node
Working directory: /work/fmriprep_22_0_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_2_echo_1_wf/bold_t2smap_wf/t2smap_node
Node inputs:
args = <undefined>
echo_times = [0.0124, 0.03013, 0.04786, 0.06559]
environ = {}
fittype = curvefit
in_files = <undefined>
mask_file = <undefined>
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
result["result"] = node.run(updatehash=updatehash)
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 524, in run
result = self._run_interface(execute=True)
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 642, in _run_interface
return self._run_command(execute)
File "/opt/conda/lib/python3.9/site-packages/nipype/pipeline/engine/nodes.py", line 750, in _run_command
raise NodeExecutionError(
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node t2smap_node.
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 454, in aggregate_outputs
setattr(outputs, key, val)
File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
value = super(File, self).validate(objekt, name, value, return_pathlike=True)
File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/traits_extension.py", line 135, in validate
self.error(objekt, name, str(value))
File "/opt/conda/lib/python3.9/site-packages/traits/base_trait_handler.py", line 74, in error
raise TraitError(
traits.trait_errors.TraitError: The 't2star_map' trait of a T2SMapOutputSpec instance must be a pathlike object or string representing an existing file, but a value of '/work/fmriprep_22_0_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_2_echo_1_wf/bold_t2smap_wf/t2smap_node/T2starmap.nii.gz' <class 'str'> was specified.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 401, in run
outputs = self.aggregate_outputs(runtime)
File "/opt/conda/lib/python3.9/site-packages/nipype/interfaces/base/core.py", line 461, in aggregate_outputs
raise FileNotFoundError(msg)
FileNotFoundError: No such file or directory '/work/fmriprep_22_0_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_2_echo_1_wf/bold_t2smap_wf/t2smap_node/T2starmap.nii.gz' for output 't2star_map' of a T2SMap interface
Command I executed:
singularity run --cleanenv --bind /mnt/dbgroup-share/mpawlik/data-ccns/anclang:/data --bind /mnt/dbgroup-share/mpawlik/scratch:/work --bind /home/mpawlik/bids/freesurfer_license.txt:/opt/freesurfer/license.txt fmriprep_22.0.0rc3 /data /data/derivatives/fmriprep participant --nprocs 16 --me-output-echos --fs-no-reconall --notrack --skip_bids_validation -v -w /work
Update: fmriprep 21.0.2 threw numpy.core._exceptions.MemoryError. This was most probably due to too many parallel jobs requesting too much memory.
Here's the error:
Node: fmriprep_wf.single_subject_s003_wf.func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf.bold_t2smap_wf.t2smap_node
Working directory: /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/bold_t2smap_wf/t2smap_node
Node inputs:
args = <undefined>
echo_times = [0.0124, 0.03013, 0.04786, 0.06559]
environ = {}
fittype = curvefit
in_files = <undefined>
mask_file = <undefined>
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
result["result"] = node.run(updatehash=updatehash)
File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 516, in run
result = self._run_interface(execute=True)
File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 635, in _run_interface
return self._run_command(execute)
File "/opt/conda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 741, in _run_command
result = self._interface.run(cwd=outdir)
File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 428, in run
runtime = self._run_interface(runtime)
File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 822, in _run_interface
self.raise_exception(runtime)
File "/opt/conda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 749, in raise_exception
raise RuntimeError(
RuntimeError: Command:
t2smap -d /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_0/merge/vol0000_unwarped_merged.nii.gz /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_1/merge/vol0000_unwarped_merged.nii.gz /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_2/merge/vol0000_unwarped_merged.nii.gz /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_3/merge/vol0000_unwarped_merged.nii.gz -e 12.4 30.13 47.86 65.58999999999999 --mask /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/brainextraction_wf/_echoidx_0/masker/clipped_mask.nii.gz --fittype curvefit
Standard output:
Standard error:
INFO t2smap:t2smap_workflow:241 Using output directory: /work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/bold_t2smap_wf/t2smap_node
INFO t2smap:t2smap_workflow:251 Loading input data: ['/work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_0/merge/vol0000_unwarped_merged.nii.gz', '/work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_1/merge/vol0000_unwarped_merged.nii.gz', '/work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_2/merge/vol0000_unwarped_merged.nii.gz', '/work/fmriprep_wf/single_subject_s003_wf/func_preproc_ses_3_task_vw_acq_mb4mesl56_run_1_echo_1_wf/unwarp_wf/_echoidx_3/merge/vol0000_unwarped_merged.nii.gz']
INFO t2smap:t2smap_workflow:267 Using user-defined mask
INFO t2smap:t2smap_workflow:270 Computing adaptive T2* map
Traceback (most recent call last):
File "/opt/conda/bin/t2smap", line 8, in <module>
sys.exit(_main())
File "/opt/conda/lib/python3.8/site-packages/tedana/workflows/t2smap.py", line 343, in _main
t2smap_workflow(**kwargs)
File "/opt/conda/lib/python3.8/site-packages/tedana/workflows/t2smap.py", line 272, in t2smap_workflow
(t2s_limited, s0_limited, t2s_full, s0_full) = decay.fit_decay(
File "/opt/conda/lib/python3.8/site-packages/tedana/decay.py", line 372, in fit_decay
data_masked = data[mask, :, :]
numpy.core._exceptions.MemoryError: Unable to allocate 1.73 GiB for an array with shape (90714, 4, 640) and data type float64
Return code: 1
Hi @mateuszpawlik @tsalo
Did you ever get this working? I am having the same problem with t2smap but within a subject (many runs per subject) Thanks
We successfully executed fmriprep 21.0.2. Here's my command:
singularity run --cleanenv --bind /mnt/dbgroup-share/mpawlik/data-ccns/anclang:/data --bind /mnt/dbgroup-share/mpawlik/scratch:/work --bind /home/mpawlik/bids/freesurfer_license.txt:/opt/freesurfer/license.txt fmriprep_21.0.2.sif /data /data/derivatives/fmriprep participant --me-output-echos --nprocs 4 --resource-monitor --fs-no-reconall --notrack --skip-bids-validation --task-id vw -v -w /work
We haven't tried with newer versions.
So the problem here is that T2SMap is not annotated with memory consumption: https://github.com/nipreps/fmriprep/blob/master/fmriprep/workflows/bold/t2s.py#L101 which means we will assume it will use 100MiB. We can see from @mateuszpawlik's node that we seem to have voxels x echos x time points x 64-bit memory complexity.
@tsalo are there any extra coefficients or multiplicative factors that should be considered?
At the very least, taking the full image set size and throwing in a factor of two will be a better estimate than 100M.
I ran a local test and the relevant code (the bit that seemed to be failing) was maxing out at ~2 GB on a 288 MB file with three echoes. The proposed change in #2898 came out to 1.73 GB, so it might need a bit more? The function also uses scipy.optimize.curve_fit, but I have no clue what the memory usage tends to be for that, sorry.
Okay, a factor of 2.5 would get pretty close to 2gb.