fmriprep icon indicating copy to clipboard operation
fmriprep copied to clipboard

The output of fmriprep is incomplete

Open WangYunHong98 opened this issue 1 year ago • 7 comments

What happened?

I used fmriprep-24.0.0 on slurm to process my MRI data, but the outputs were incomplete.

For example, freesurfer directory was empty and the output of --output-space I specified was not produced in the func and anat directory.

I uploaded the log file. Please check it out fmriprep-30459_6.json

What command did you use?

#!/bin/bash 
#SBATCH --job-name=fmriprep
#SBATCH --nodes=1
#SBATCH --partition=CPU_DreamNeuro
#SBATCH --cpus-per-task=14
#SBATCH --mem-per-cpu=4G
#SBATCH --out=./slurm_log/fmriprep/%x-%A_%a.out 
#SBATCH --error=./slurm_log/fmriprep/%x-%A_%a.err
subject=$( sed -n -E "$((${SLURM_ARRAY_TASK_ID} + 1))s/sub-(\S*)\>.*/\1/gp" ${BIDS_DIR}/participants.tsv )
age=$( sed -n -E "$((${SLURM_ARRAY_TASK_ID} + 1))s/sub-(\S*)\s+(\S*)\s+.*/\2/gp" ${BIDS_DIR}/participants.tsv )

echo "Running: $subject"
echo "Suject age: $age"

if (( $(echo "$age >= 7.5 && $age <= 13.5" | bc -l) )); then
  template="MNIPediatricAsym:cohort-4:res-2"
elif (( $(echo "$age >= 13 && $age <= 18.5" | bc -l) )); then
  template="MNIPediatricAsym:cohort-6:res-2"
elif (( $(echo "$age >= 4.5 && $age <= 8.5" | bc -l) )); then
  template="MNIPediatricAsym:cohort-2:res-2"
else
  template=""
  note="Age is not in the range, you need choose another template"
  echo "$note"
fi
echo "Template is $template"

# bind container with host
cmd="singularity run \
    --cleanenv \
    -B $PROJECT:/rootdir \
    -B $BIDS_DIR:/data:ro \
    -B $DERIVS_DIR:/output \
    -B $WORK_DIR:/work \
    -B ${SINGULARITYENV_FS_LICENSE}:/freesurfer_license/ \
    -B ${TEMPLATEFLOW_HOST_HOME}:${SINGULARITYENV_TEMPLATEFLOW_HOME} \
    ${FMRIPREP} /data /output \
    participant --participant_label $subject \
    -w /work \
    -vvv \
    --fs-license-file /freesurfer_license/license.txt \
    --fs-subjects-dir /output/freesurfer \
    --mem-mb 60000 \
    --n-cpus 14 \
    --write-graph \
    --output-spaces $template MNIPediatricAsym:cohort-1:res-2 MNI152NLin2009cAsym:res-2 anat fsnative func \
    --return-all-components \
    --notrack \
    "

echo "Running task: $SLURM_ARRAY_TASK_ID"
echo "Command: $cmd"
eval "$cmd"

What version of fMRIPrep are you running?

24.0.0

How are you running fMRIPrep?

Singularity

Is your data BIDS valid?

Yes

Are you reusing any previously computed results?

No

Please copy and paste any relevant log output.

Please see the log file above.

Additional information / screenshots

There are outputs in func directory d641d6eb279d04d4302f1a9b990a0fe

WangYunHong98 avatar Jul 24 '24 13:07 WangYunHong98

240724-20:22:13,885 nipype.workflow INFO:
	 [Node] Executing "fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad" <niworkflows.interfaces.bids.BIDSFreeSurferDir>
240724-20:22:13,939 nipype.workflow INFO:
	 [Node] Finished "fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad", elapsed time 0.046097s.
240724-20:22:13,939 nipype.workflow DEBUG:
	 Saving results file: '/work/fmriprep_24_0_wf/fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad/result_fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad.pklz'
240724-20:22:13,939 nipype.workflow WARNING:
	 Storing result file without outputs
240724-20:22:13,947 nipype.workflow WARNING:
	 [Node] Error on "fmriprep_24_0_wf.fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad" (/work/fmriprep_24_0_wf/fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad)
240724-20:22:13,953 nipype.workflow DEBUG:
	 Clearing 0 from queue
240724-20:22:13,953 nipype.utils DEBUG:
	 Loading pkl: /work/fmriprep_24_0_wf/fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad/result_fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad.pklz
240724-20:22:13,966 nipype.workflow ERROR:
	 Node fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad failed to run on host idw01.
240724-20:22:13,967 nipype.workflow ERROR:
	 Saving crash info to /output/logs/crash-20240724-202213-wyh-fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad-0750bb4e-85fc-437e-a7a8-60740dbafebb.txt
Traceback (most recent call last):
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/plugins/multiproc.py", line 344, in _send_procs_to_workers
    self.procs[jobid].run(updatehash=updatehash)
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad.

Traceback:
	Traceback (most recent call last):
	  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/nipype/interfaces/base/core.py", line 397, in run
	    runtime = self._run_interface(runtime)
	              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/opt/conda/envs/fmriprep/lib/python3.11/site-packages/niworkflows/interfaces/bids.py", line 960, in _run_interface
	    shutil.rmtree(dest)
	  File "/opt/conda/envs/fmriprep/lib/python3.11/shutil.py", line 752, in rmtree
	    _rmtree_safe_fd(fd, path, onerror)
	  File "/opt/conda/envs/fmriprep/lib/python3.11/shutil.py", line 683, in _rmtree_safe_fd
	    onerror(os.rmdir, fullname, sys.exc_info())
	  File "/opt/conda/envs/fmriprep/lib/python3.11/shutil.py", line 681, in _rmtree_safe_fd
	    os.rmdir(entry.name, dir_fd=topfd)
	OSError: [Errno 39] Directory not empty: 'label'


240724-20:22:13,981 nipype.workflow INFO:
	 [Job 0] Completed (fmriprep_24_0_wf.fsdir_run_20240724_202139_7f3a5ef2_8f8b_4828_b5fa_f486b97008ad).

There's either a permissions issue or a concurrency issue. Are you running multiple subjects simultaneously with the same FreeSurfer directory?

effigies avatar Jul 24 '24 17:07 effigies

Hi @effigies

I possibly run multiple subjects because I submitted the slurm job using this code sbatch --array=1-$(( $( wc -l ${PROJECT}/BIDS_heudiconv/participants.tsv | cut -f1 -d' ' ) - 1 ))%20 fmriprep.slurm

WangYunHong98 avatar Aug 02 '24 14:08 WangYunHong98

So what's happening is that you have an fsaverage directory from an older version of FreeSurfer. To keep FreeSurfer working correctly, we delete and replace with the bundled copy. If you have multiple copies of fMRIPrep doing this at the same time, you will get race conditions.

There's no real plausible way to protect against this in code. It's up to users not to call fMRIPrep in parallel in this situation.

effigies avatar Aug 02 '24 18:08 effigies

Hi @effigies

It occurs an error later. ERROR: Label BA1_exvivo does not exist in SUBJECTS_DIR fsaverage! The fsaverage link probably points to an older freesurfer version

This error is very strange. Sometimes I'm running fmriprep and I don't get this error, sometimes I run it and it keeps coming up. Is there any way to fix this? For example, submit a job first. After this job is completed, I submit the parallel array job.

WangYunHong98 avatar Aug 03 '24 10:08 WangYunHong98

That sounds like a recurrence of the other issue you opened: #3258.

tsalo avatar Aug 05 '24 19:08 tsalo

@tsalo

Yes, it's an old problem. I tested it, and if I submit a job array for one subject, fmriprep (version-24.0.1) runs successfully with doing surface reconstruction. But if I submit a job array for multiple subjects, I get the error I mentioned above.

By the way, the --containall was added to the script.

So, I'm ready to try this way: --fs-subjects-dir /output/freesurfer/$subject -w /work/$subject Prepare seperate directories for each subject.

But I'm not sure it's the right thing to do. What do you think? @effigies

WangYunHong98 avatar Aug 07 '24 12:08 WangYunHong98

So what's happening is that you have an fsaverage directory from an older version of FreeSurfer. To keep FreeSurfer working correctly, we delete and replace with the bundled copy. If you have multiple copies of fMRIPrep doing this at the same time, you will get race conditions.

@effigies So I'm trying to deal with this issue now and receiving the BA1_exvivo error. I am using version 24.1.1 as an Apptainer dispatched as one instance per subject. These are dispatched in parallel with a small time offset. Even with the --containall Apptainer flag being set, this still occurs.

If I understand you correctly, the directory:

<BIDS_DIR>/derivatives/fmriprep/sourcedata/freesurfer/fsaverage

is remade every time and this operation is always susceptible to a race condition?

I get this error on a 48 core machine but not on my personal 12 core machine with the same setup.

If this is the case, would a solution be to create a temporary directory per-subject to accommodate the fsaverage folder that gets re-made each time? For example:

/tmp/<subject>/freesurfer

And have the --fs-subjects-dir point to it? I believe this is what @WangYunHong98 suggested as well in the comment above. Did it end up working for you?

MauricePasternak avatar Oct 26 '24 14:10 MauricePasternak

@effigies

Would it be feasible in future fmriprep versions to completely isolate the freesurfer processes, i.e. to avoid <BIDS_DIR>/derivatives/fmriprep/sourcedata/freesurfer/fsaverage as a shared directory for concurrently running fmriprep jobs and instead give each subject a completely independent freesurfer working directory?

I guess this would solve a lot of related issues?

JohannesWiesner avatar Aug 05 '25 14:08 JohannesWiesner