reproman
reproman copied to clipboard
Initial sketch for the mriqc/fmriprep singularity based workflow
An initial local attempt was slowed down by WTF of https://github.com/ReproNim/containers/issues/23
Preprint which inspired the fmriprep workflow here is http://dx.doi.org/10.1101/694364
Edit: an example of prototypical use of reproman for mriqc is now formalized in README: https://github.com/ReproNim/reproman#step-2-create-analysis-datalad-dataset-and-run-computation-on-aws-hpc2 . I will keep this PR open until we furnish a full workflow which would also include fmriprep (thus handling secret license for matlab) and then some linear modeling
Codecov Report
Merging #438 into master will increase coverage by
0.00%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #438 +/- ##
=======================================
Coverage 89.66% 89.66%
=======================================
Files 148 148
Lines 12388 12389 +1
=======================================
+ Hits 11108 11109 +1
Misses 1280 1280
Impacted Files | Coverage Δ | |
---|---|---|
reproman/interface/run.py | 100.00% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d47e38b...52c19fc. Read the comment docs.
@kyleam I have reran the script and got the same failure due to tar -- could you please confirm that you get the same?
@kyleam I have reran the script and got the same failure due to tar -- could you please confirm that you get the same?
Sure, I'll give it a try this afternoon.
could you please confirm that you get the same?
Sure, I'll give it a try this afternoon.
I've triggered it, though I don't yet have an explanation of what's going on.
I've triggered it, though I don't yet have an explanation of what's going on.
Hmm, with several attempts, I was able to trigger the failure only once. Looking at the successful runs, the togethome file does not include the files that tar is complaining about in the failed runs. The only explanation I have for that is that these are temporary files that, based on the timing of things, might end getting removed between the find ... >togethome
call and the tar call. I've submitted gh-451 as a workaround.
script
For completeness: In all the above tries, I was using this script, which is a stripped-down version of 5b95ded7e:docs/usecases/bids-fmriprep-workflow-NP.sh.
set -eu
cd $(mktemp -d --tmpdir=. ds-XXXX)
datalad create -c text2git .
datalad install -d . ///repronim/containers
datalad install -d . -s https://github.com/ReproNim/ds000003-demo data/bids
mkdir licenses/
echo freesurfer.txt > licenses/.gitignore
cat > licenses/README.md <<EOF
Freesurfer
----------
Place your FreeSurfer license into freesurfer.txt file in this directory.
Visit https://surfer.nmr.mgh.harvard.edu/registration.html to obtain one if
you don't have it yet - it is free.
EOF
datalad save -m "DOC: licenses/ directory stub" licenses/
datalad create -d . -c text2git data/mriqc
reproman run --resource sm --follow \
--sub condor --orc datalad-pair-run \
--jp container=containers/bids-mriqc --bp 'pl=02,13' \
-i data/bids -o data/mriqc \
'{inputs}' '{outputs}' participant --participant_label '{p[pl]}'
The only explanation I have for that is that these are temporary files that, based on the timing of things, might end getting removed between the
find ... >togethome
call and the tar call.
indeed probably with NFS etc we could see even more of such usecases. But I am still wondering what is exactly happening here besides may be condor returning "complete" job status before it (and all kids) actually finished, and either we wouldn't miss some results if we rush into collecting/tarring them up. May be adding some fuser
call to check if any process is still holding on that output path or alike. I will try to look into it when I get a moment.
indeed probably with NFS etc we could see even more of such usecases. But I am still wondering what is exactly happening here
I am still wondering too :)
besides may be condor returning "complete" job status before it (and all kids) actually finished
This status isn't coming from condor. Its creation is chained after the run of the command:
https://github.com/ReproNim/reproman/blob/c247acf638178a32b3ccd723f936963e7e3558b2/reproman/support/jobs/job_templates/runscript/base.template.sh#L43-L46
Ran current state (v0.2.1-43-ga81e457) on smaug as
FS_LICENSE=~/.freesurfer-license RUNNER=datalad ./bids-fmriprep-workflow-NP.sh bids-fmriprep-workflow-NP/out4
and it finished nicely (although there was a puke from fmriprep's atexit about "OSError: handle is closed" which I guess was just ignored), and everything looked kosher BUT "data/fmriprep//fmriprep/dataset_description.json" was added to git-annex not git! It is strange. The whole "data/fmriprep" was initiated with -c txt2git
and it seemed to work -- .tsv etc were committed to git but all jsons to annex. It reminded me about https://git-annex.branchable.com/bugs/manages_to_incorrectly_add_to_annex_instead_of_git_based_on___34__mimetype__34___-_we_cannot_figure_it_out_why/ which Joey "done, the magic database changing behavior is not a bug in git-annex". I naively had wrong impression that this issue was actually fixed somehow properly.
Well -- on libmagic side -k
was indeed fixed around then, so it is possible to obtain multiple mimetypes, and the 2nd one for .json is text:
$> file --mime-type -Lk 1.json
1.json: application/json\012- text/plain
oh hoh -- that is a good old https://github.com/datalad/datalad/issues/3361 which we didn't really address although git annex provided a way already :-/
I changed the defaults for RM_RESOURCE and RM_SUB for those of us without smaug-condor (though RM_RESOURCE=local also requires setup -- perhaps something to autodetect or automate later).
Now I get to reproman run at bids-fmriprep-workflow-NP.sh:144 which fails with:
Local dataset /Users/ch/code/reproman/study is dirty. Save or discard uncommitted changes [orchestrators.py:init:504] (OrchestratorError)
/Users/ch/code/reproman/study is the argument to bids-fmriprep-workflow-NP.sh.
sweet. So what is the output of git -C /Users/ch/code/reproman/study status
? what was your RM_SUB?
RM_SUB=local. git -C /Users/ch/code/reproman/study status
gives:
On branch master Changes not staged for commit: (use "git add
..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) (commit or discard the untracked or modified content in submodules) modified: containers (untracked content)
no changes added to commit (use "git add" and/or "git commit -a")`
So go to containers and run status there, it will reveal the mystery