reproman icon indicating copy to clipboard operation
reproman copied to clipboard

Initial sketch for the mriqc/fmriprep singularity based workflow

Open yarikoptic opened this issue 5 years ago • 13 comments

An initial local attempt was slowed down by WTF of https://github.com/ReproNim/containers/issues/23

Preprint which inspired the fmriprep workflow here is http://dx.doi.org/10.1101/694364

Edit: an example of prototypical use of reproman for mriqc is now formalized in README: https://github.com/ReproNim/reproman#step-2-create-analysis-datalad-dataset-and-run-computation-on-aws-hpc2 . I will keep this PR open until we furnish a full workflow which would also include fmriprep (thus handling secret license for matlab) and then some linear modeling

yarikoptic avatar Jul 18 '19 18:07 yarikoptic

Codecov Report

Merging #438 into master will increase coverage by 0.00%. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #438   +/-   ##
=======================================
  Coverage   89.66%   89.66%           
=======================================
  Files         148      148           
  Lines       12388    12389    +1     
=======================================
+ Hits        11108    11109    +1     
  Misses       1280     1280           
Impacted Files Coverage Δ
reproman/interface/run.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d47e38b...52c19fc. Read the comment docs.

codecov[bot] avatar Jul 18 '19 18:07 codecov[bot]

@kyleam I have reran the script and got the same failure due to tar -- could you please confirm that you get the same?

yarikoptic avatar Aug 16 '19 15:08 yarikoptic

@kyleam I have reran the script and got the same failure due to tar -- could you please confirm that you get the same?

Sure, I'll give it a try this afternoon.

kyleam avatar Aug 16 '19 16:08 kyleam

could you please confirm that you get the same?

Sure, I'll give it a try this afternoon.

I've triggered it, though I don't yet have an explanation of what's going on.

kyleam avatar Aug 16 '19 19:08 kyleam

I've triggered it, though I don't yet have an explanation of what's going on.

Hmm, with several attempts, I was able to trigger the failure only once. Looking at the successful runs, the togethome file does not include the files that tar is complaining about in the failed runs. The only explanation I have for that is that these are temporary files that, based on the timing of things, might end getting removed between the find ... >togethome call and the tar call. I've submitted gh-451 as a workaround.

script

For completeness: In all the above tries, I was using this script, which is a stripped-down version of 5b95ded7e:docs/usecases/bids-fmriprep-workflow-NP.sh.

set -eu

cd $(mktemp -d --tmpdir=. ds-XXXX)
datalad create -c text2git .
datalad install -d . ///repronim/containers
datalad install -d . -s https://github.com/ReproNim/ds000003-demo data/bids

mkdir licenses/
echo freesurfer.txt > licenses/.gitignore
cat > licenses/README.md <<EOF

Freesurfer
----------

Place your FreeSurfer license into freesurfer.txt file in this directory.
Visit https://surfer.nmr.mgh.harvard.edu/registration.html to obtain one if
you don't have it yet - it is free.

EOF
datalad save -m "DOC: licenses/ directory stub" licenses/

datalad create -d . -c text2git data/mriqc

reproman run --resource sm --follow \
         --sub condor --orc datalad-pair-run \
         --jp container=containers/bids-mriqc --bp 'pl=02,13' \
         -i data/bids -o data/mriqc \
         '{inputs}' '{outputs}' participant --participant_label '{p[pl]}'

kyleam avatar Aug 16 '19 21:08 kyleam

The only explanation I have for that is that these are temporary files that, based on the timing of things, might end getting removed between the find ... >togethome call and the tar call.

indeed probably with NFS etc we could see even more of such usecases. But I am still wondering what is exactly happening here besides may be condor returning "complete" job status before it (and all kids) actually finished, and either we wouldn't miss some results if we rush into collecting/tarring them up. May be adding some fuser call to check if any process is still holding on that output path or alike. I will try to look into it when I get a moment.

yarikoptic avatar Aug 20 '19 20:08 yarikoptic

indeed probably with NFS etc we could see even more of such usecases. But I am still wondering what is exactly happening here

I am still wondering too :)

besides may be condor returning "complete" job status before it (and all kids) actually finished

This status isn't coming from condor. Its creation is chained after the run of the command:

https://github.com/ReproNim/reproman/blob/c247acf638178a32b3ccd723f936963e7e3558b2/reproman/support/jobs/job_templates/runscript/base.template.sh#L43-L46

kyleam avatar Aug 20 '19 21:08 kyleam

Ran current state (v0.2.1-43-ga81e457) on smaug as

FS_LICENSE=~/.freesurfer-license RUNNER=datalad ./bids-fmriprep-workflow-NP.sh bids-fmriprep-workflow-NP/out4

and it finished nicely (although there was a puke from fmriprep's atexit about "OSError: handle is closed" which I guess was just ignored), and everything looked kosher BUT "data/fmriprep//fmriprep/dataset_description.json" was added to git-annex not git! It is strange. The whole "data/fmriprep" was initiated with -c txt2git and it seemed to work -- .tsv etc were committed to git but all jsons to annex. It reminded me about https://git-annex.branchable.com/bugs/manages_to_incorrectly_add_to_annex_instead_of_git_based_on___34__mimetype__34___-_we_cannot_figure_it_out_why/ which Joey "done, the magic database changing behavior is not a bug in git-annex". I naively had wrong impression that this issue was actually fixed somehow properly. Well -- on libmagic side -k was indeed fixed around then, so it is possible to obtain multiple mimetypes, and the 2nd one for .json is text:

$> file --mime-type -Lk 1.json
1.json: application/json\012- text/plain

yarikoptic avatar Dec 20 '19 19:12 yarikoptic

oh hoh -- that is a good old https://github.com/datalad/datalad/issues/3361 which we didn't really address although git annex provided a way already :-/

yarikoptic avatar Dec 20 '19 23:12 yarikoptic

I changed the defaults for RM_RESOURCE and RM_SUB for those of us without smaug-condor (though RM_RESOURCE=local also requires setup -- perhaps something to autodetect or automate later).

Now I get to reproman run at bids-fmriprep-workflow-NP.sh:144 which fails with:

Local dataset /Users/ch/code/reproman/study is dirty. Save or discard uncommitted changes [orchestrators.py:init:504] (OrchestratorError)

/Users/ch/code/reproman/study is the argument to bids-fmriprep-workflow-NP.sh.

chaselgrove avatar Jan 03 '20 21:01 chaselgrove

sweet. So what is the output of git -C /Users/ch/code/reproman/study status? what was your RM_SUB?

yarikoptic avatar Jan 04 '20 00:01 yarikoptic

RM_SUB=local. git -C /Users/ch/code/reproman/study status gives:

On branch master Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) (commit or discard the untracked or modified content in submodules)

modified: containers (untracked content)

no changes added to commit (use "git add" and/or "git commit -a")`

chaselgrove avatar Jan 07 '20 21:01 chaselgrove

So go to containers and run status there, it will reveal the mystery

yarikoptic avatar Jan 07 '20 23:01 yarikoptic