dot never (well -- 6 days so far) finishes to render the graph
What happened?
@michael-sun ran fmriprep 21.0.1 singularity (from docker) container on local HPC and was surprised that job was not finished after a week. We checked on the compute node and discovered that dot is still running!
f003z4j 153080 99.7 0.3 1481260 1296420 ? R Sep23 9881:59 dot -Tsvg -o/dartfs-hpc/scratch/f003z4j/fmriprep-work/work-SID001651/fmriprep_wf/graph.svg /dartfs-hpc/scratch/f003z4j/fmriprep-work/work-SID001651/fmriprep_wf/graph.dot
that graph.dot renamed and compressed is: graph-bigfancy.dot.gz
I started to run locally with dot from graphviz 2.42.2-7 -- so far minutes with no completion. Might be an issue with graphviz to file or something about the .dot file to fix.
anyways -- I think it would be useful to establish some kind of upper-bound timeout for invocation of dot. I think it might be valuable to have some could-not-render graph.svg to be used in this case instead of actual graph and issue the warning instead of halting compute or completely errorring out. Although I could be proven wrong
What command did you use?
Most likely not needed since seems to hang locally as well but here it is -- just remove `\ `s ;)
singularity run \ --cleanenv \ -B ${MAINDIR}:${MAINDIR} \ -B ${BIDSDIR},${PREPROCDIR},${SCRATCHDIR} \ -B /optnfs/freesurfer:/optnfs/freesurfer ${IMAGE} \ ${BIDSDIR} ${OUTDIR} participant \ --participant_label ${SUBJ} \ --ignore slicetiming \ --resource-monitor \ --bold2t1w-dof 9 \ --dummy-scans 6 \ --write-graph \ --notrack \ --fs-no-reconall \ --nprocs 8 \ --omp-nthreads 5 \ --nthreads 5 \ --mem_mb 60000 \ --fs-license-file /optnfs/freesurfer/6.0.0/license.txt \ --skip_bids_validation \ --output-spaces T1w MNI152NLin2009cAsym \ -w ${WORKDIR} \ --use-aroma --aroma-melodic-dimensionality -200 --bids-filter-file ${FILTER_DIR}/${SUBJ}.json
What version of fMRIPrep are you running?
21.0.1
How are you running fMRIPrep?
Singularity
Is your data BIDS valid?
Yes
Are you reusing any previously computed results?
Work directory
Please copy and paste any relevant log output.
No response
Additional information / screenshots
No response
I would probably skip --write-graph on such large workflows. Probably the right thing to do is to set a timeout on graphviz and just print a warning that dot timed out and you can render the graph yourself if you want.
This may be the culprit - https://github.com/nipy/nipype/issues/3526
Nothing to do here until nipype allows us to set a timeout and catch a TimeoutError.