datalad-neuroimaging
datalad-neuroimaging copied to clipboard
support DICOMs tarballs
It is quite common to have dicoms (e.g. for a single sequence) .tar or .zip balled. I wondered if we could/should make it possible to extract/aggregate metadata from those. I could see it done
- within dicom extractor
- in a dedicated dicom-tarballs extractor
- a generic "extractor helper" (e.g. called "balls") which could then be used to prepare (extract) data for other extractors to munch on
The question is how to "represent" that metadata
- tarball could be considered as a "subdataset" of a kind, and thus we could extract/keep it similarly to how we deal with subdatasets
- files within tarball could be considered "continuation" of a path for the file, e.g. for a file bu.dcm within
a/b/bu.tar
it could be patha/b/bu.tar/bu.dcm
or some more explicitly defined boundarya/b/bu.tar//bu.dcm
ora/b/bu.tar#bu.dcm
or evena/b/bu.tar#path=bu.dcm
to be inline with how we deal with referencing files in tarballs within our special remote - we could extract/contain only fields common and identical to all files in the tarball, and thus associate with the tarball itself
Hm. In https://github.com/psychoinformatics-de/datalad-hirni (which when ironed out should yield a generalized form to be part of datalad-neuroimaging
) we simply make a subdataset from the tarball, which in return is added via add-archive-content
. So, you can throw away the DICOMs (and/or the tarball) after metadata extraction, but have metadata on the actual DICOMs.
ATM I don't see, why it would be useful to invent an additional way to reference an archive's content other than what add-archive-content
does.
Do you have a usecase that somehow benefits from not annexing the extracted files?