datalad-neuroimaging icon indicating copy to clipboard operation
datalad-neuroimaging copied to clipboard

cfg_* procedure(s) for preferable .gitattributes for various known dataset types

Open yarikoptic opened this issue 5 years ago • 0 comments

ATM we have cfg_bids which

  • sets up .gitattributes to have some files directly in git
  • sets up metadata extraction configuration

But besides BIDS I keep running into the need to establish .gitattributes for following types, where I think following, analogous to BIDS one, should be done

.feat and .gfeat FSL outputs

  • .gitattributes - may be use a cfg_text2git?

on a sample .gfeat directory of 9GB, with a regular cfg_text2git I ended up with 260KB .git/objects

that allowed to quickly install that dataset elsewhere, datalad get **/*.png

  • metadata
    • datalad: eventually might configure the extractor
    • git-annex: we might like to annotate with annex metadata file types may be so on shells without ** ppl could quickly get all needed supplementary data files to browse the results

fmriprep

  • .gitattributes

    I had

*.md annex.largefiles=nothing
*.html annex.largefiles=nothing
*.json annex.largefiles=nothing
CITATION.* annex.largefiles=(not(mimetype=text/*))

which resulted in 32MB .git/objects for ~500GB dataset (~250 subjects).

  • metadata
    • configure extractors (nifti1, bids, may be more when support FreeSurfer etc)
    • interesting use case since BIDS(-derivative) dataset is not at the top of this dataset which has two directories -- fmriprep and freesurfer, so bids extractor should be informed to look into fmrieprep/

HOWTO

Pretty much all those scenarios are very similar and just require only slightly different specification. I see two implementation possibilities

breed cfg_* scripts

  • extract common code from cfg_bids into some cfg_common.py helper
  • reuse from within individual cfg_bids, cfg_feat, cfg_fmriprep

create (optionally parametrized) cfg_neuroimaging_dataset

which would sense (or "force" via explicit parameter) the type of the dataset and act accordingly (if can figure out, crash if fails and no explicit parameter such as "bids") is specified

yarikoptic avatar Jul 17 '19 14:07 yarikoptic