datalad
datalad copied to clipboard
"extractors" for input/output arguments for run
Came up for HCP .spec files use case -- inputs/outputs might already be specified in the .spec file. Giving that .spec to wb_command for actual transformation. The question -- how could we have some kind of way to provide adapters/extractors to inform datalad run/rerun about input/outputs
edit: sample .spec files are in hcp dataset https://github.com/datalad-datasets/human-connectome-project-openaccess eg
(git)smaug:/mnt/datasets/datalad/crawl/hcp-openaccess[master]
$> find HCP1200/ -iname *spec
HCP1200/100206/MNINonLinear/100206.164k_fs_LR.wb.spec
HCP1200/100206/MNINonLinear/100206.MSMAll.164k_fs_LR.wb.spec
HCP1200/100206/MNINonLinear/Native/100206.native.wb.spec
HCP1200/100206/MNINonLinear/fsaverage_LR32k/100206.32k_fs_LR.wb.spec
HCP1200/100206/MNINonLinear/fsaverage_LR32k/100206.MSMAll.32k_fs_LR.wb.spec
@mih suggested on the call to just provide some kind of "outside" runner which would comprehend .spec file and craft datalad run
invocation providing all extracted input/outputs. Note: current master (aimed for 0.16.x) has ability for placeholders in inputs/output specs which might come of help.
Another use case came up in the context of https://github.com/con/opfvta-replication-2023 where we would like to get a specific subset of files. Currently it is a manual "datalad get" call with a long list of them coded up in Makefile.
To make it datalad run
compatible, we would need either an excruciating list of -i
options or be able to point to some file with the list of the filenames or it could be a script which would print them, which is exactly the target wishlist here, which would allow for that use case.
Also think about filenames with newlines -- we might want to make it 0-delimited list by default.