datalad-neuroimaging icon indicating copy to clipboard operation
datalad-neuroimaging copied to clipboard

support DICOMs tarballs

Open yarikoptic opened this issue 6 years ago • 1 comments

It is quite common to have dicoms (e.g. for a single sequence) .tar or .zip balled. I wondered if we could/should make it possible to extract/aggregate metadata from those. I could see it done

  • within dicom extractor
  • in a dedicated dicom-tarballs extractor
  • a generic "extractor helper" (e.g. called "balls") which could then be used to prepare (extract) data for other extractors to munch on

The question is how to "represent" that metadata

  • tarball could be considered as a "subdataset" of a kind, and thus we could extract/keep it similarly to how we deal with subdatasets
  • files within tarball could be considered "continuation" of a path for the file, e.g. for a file bu.dcm within a/b/bu.tar it could be path a/b/bu.tar/bu.dcm or some more explicitly defined boundary a/b/bu.tar//bu.dcm or a/b/bu.tar#bu.dcm or even a/b/bu.tar#path=bu.dcm to be inline with how we deal with referencing files in tarballs within our special remote
  • we could extract/contain only fields common and identical to all files in the tarball, and thus associate with the tarball itself

yarikoptic avatar May 31 '18 15:05 yarikoptic

Hm. In https://github.com/psychoinformatics-de/datalad-hirni (which when ironed out should yield a generalized form to be part of datalad-neuroimaging) we simply make a subdataset from the tarball, which in return is added via add-archive-content. So, you can throw away the DICOMs (and/or the tarball) after metadata extraction, but have metadata on the actual DICOMs. ATM I don't see, why it would be useful to invent an additional way to reference an archive's content other than what add-archive-content does. Do you have a usecase that somehow benefits from not annexing the extracted files?

bpoldrack avatar May 31 '18 17:05 bpoldrack