containers icon indicating copy to clipboard operation
containers copied to clipboard

Prototypical workflow #1

Open yarikoptic opened this issue 5 years ago • 3 comments

Originally "presented" in training materials issue: https://github.com/ReproNim/module-dataprocessing/issues/26#issuecomment-488298754

Here I would like to have it as a checklist ([x] (r) for "waiting the release(s)")

  • [x] (r) datalad create -c text2git analysis-for-the-pi; cd analysis-for-the-pi text2git has an outstanding issue https://github.com/datalad/datalad/issues/3361 which might redefine it, but otherwise - possible
  • [x] datalad create -d . data/dicoms && cp ALL_DICOMS data/dicoms/
  • [x] datalad install -d . https://github.com/ReproNim/containers/
  • [ ] workout heuristic for heudiconv under code/heudiconv-heuristic.py
  • [x] (r) datalad create -d . -c bids data/bids -c bids is coming with 0.12 release of datalad and datalad-neuroimaging some time soonish (so - partially done)
  • [ ] datalad containers-run -n containers/heudiconv -f code/heudiconv-heuristic -o data/bids --files data/dicoms (TODO - container: https://github.com/ReproNim/containers/issues/2)
  • [x] Deface! apparently there is no "official" bids-app yet, but there is a number of defacers available, thus TODO - streamline (bids-app, container etc)
  • Carry out analys(es). For each one ATM subdataset should first be pre-created. Some (e.g., fmriprep might benefit from custom -c configs on what should go under git/annex)
    • [x] datalad create -d . -c text2git data/mriqc
    • [x] (r) datalad containers-run --explicit -n containers/bids-mriqc -i data/bids -o data/mriqc '{inputs}' '{outputs}' ... (TODO - test! TODO -- needs 0.3.2 release of -containers for proper '{inputs}' to not leak container file in there)
    • [x] datalad create -d . -c text2git data/simple_workflow
    • [ ] datalad containers-run -n containers/simple_workflow -i data/bids -o data/simple_workflow ... '{inputs}' ... '{outputs}' (TODO - container: https://github.com/ReproNim/containers/issues/2)
  • [ ] when all is good, look into upload to wherever (datalad create-sibling*, datalad publish) ;) TODO: full invocation example

Notes:

  • could be argued to step slightly away from YODA principle of derived datasets containing all needed information to reproduce themselves, because there is only a single containers/ subdataset at the super-dataset level, and derived datasets do not contain it. For the purpose of this workflow I am considering the top level super-dataset as the "reproducibility target". Having access to it will provide all needed information to reproduce any particular subdataset.
  • in principle aforementioned shortcoming could easily be resolved by installing containers/ dataset into each result subdataset, but then it would also require installation of original data "neighbor" dataset within. Could be a reckless clone or benefit from CoW on such as BTRFS. But for the initial presentation/use-case I think it should be good enough
  • from aforementioned example it seems to be very common to run a container which saves output to a new sub-dataset (if that one doesn't exist yet). I wonder if that anyhow could be assisted by datalad-container (TODO - issue)

yarikoptic avatar May 08 '19 19:05 yarikoptic

FTR: That's pretty much what datalad-hirni is for and our approach is similar (but YODA compliant ;-) ). I have a poster and a software demo at OHBM - so working on proper documentation ATM. Give me a little bit more time, then I can link to a reasonable description.

bpoldrack avatar May 10 '19 06:05 bpoldrack

This would be highly appreciated! According to http://bids-apps.neuroimaging.io/apps/, PeerHerholz/BIDSonym is now an "official" BIDS app. Would you mind considering to add it to the mix?

alexenge avatar Dec 10 '21 14:12 alexenge

Checked - since official bids app, it was added to the mix. I guess now is a matter of trying out again (I remember filling a number of issues) and seeing if all is good

yarikoptic avatar Dec 10 '21 14:12 yarikoptic