datalad-neuroimaging
datalad-neuroimaging copied to clipboard
cfg_* procedure(s) for preferable .gitattributes for various known dataset types
ATM we have cfg_bids
which
- sets up .gitattributes to have some files directly in git
- sets up metadata extraction configuration
But besides BIDS I keep running into the need to establish .gitattributes for following types, where I think following, analogous to BIDS one, should be done
.feat and .gfeat FSL outputs
- .gitattributes - may be use a
cfg_text2git
?
on a sample .gfeat directory of 9GB, with a regular cfg_text2git
I ended up with 260KB .git/objects
that allowed to quickly install that dataset elsewhere, datalad get **/*.png
- metadata
- datalad: eventually might configure the extractor
- git-annex: we might like to annotate with annex metadata file types may be so on shells without
**
ppl could quickly get all needed supplementary data files to browse the results
fmriprep
-
.gitattributes
I had
*.md annex.largefiles=nothing
*.html annex.largefiles=nothing
*.json annex.largefiles=nothing
CITATION.* annex.largefiles=(not(mimetype=text/*))
which resulted in 32MB .git/objects for ~500GB dataset (~250 subjects).
- metadata
- configure extractors (nifti1, bids, may be more when support FreeSurfer etc)
- interesting use case since BIDS(-derivative) dataset is not at the top of this dataset which has two directories --
fmriprep
andfreesurfer
, so bids extractor should be informed to look intofmrieprep/
HOWTO
Pretty much all those scenarios are very similar and just require only slightly different specification. I see two implementation possibilities
breed cfg_* scripts
- extract common code from cfg_bids into some cfg_common.py helper
- reuse from within individual
cfg_bids
,cfg_feat
,cfg_fmriprep
create (optionally parametrized) cfg_neuroimaging_dataset
which would sense (or "force" via explicit parameter) the type of the dataset and act accordingly (if can figure out, crash if fails and no explicit parameter such as "bids") is specified