heudiconv
heudiconv copied to clipboard
A way to add additional DICOM fields, which might have been filtered out by dcmstack
ATM we use dcmstack to extract additional (when not --min-meta
) metadata from DICOMs. One of the pains is that it takes long time, probably partially because if I read https://github.com/nipy/heudiconv/blob/master/heudiconv/dicoms.py#L405) correctly we actually pretty much rebuild the entire nifti data as well along the way, even if we care only about the metadata.
But also dcmstack has its own idea on what DICOM fields to include/exclude (see https://github.com/moloney/dcmstack/blob/f6631278b3558677a9cc94b0b0ba362decb6efb6/src/dcmstack/dcmstack.py#L63) so e.g. I do not have access anyhow to the PatientWeight field even if I decide that I need it.
Ideally, I think, heuristics should be able to instruct metadata extraction on which fields to extract (or exclude) in addition to the ones dcmstack extracts, and we should not do it in yet another loop over dicoms. ATM we already first spend time extracting seqinfo records, and then doing that metadata extraction using dcmstack.
I am yet not sure how we should do it BUT may be we could/should add "custom_meta" field to SeqInfo which would contain the FrozenDict of such key/values. Pros:
- no additional runtime penalty (everything is there)
- then if we allow heuristics to specify custom fields - they could accomplish whatever custom they want based on the dicoms (not limited to only those we provide). Cons:
- not sure if it wouldn't affect negatively those .auto/.edit files we create for manual renaming
- we would need one more configuration setting/heuristic config to actually say either those should be stored in the sidecar .json files. So may be then it should be done at the point of metadata extraction through somehow piggibacking on dcmstack to avoid it filtering desired fields out, and thus keep it limited only to the metadata extraction and not providing additional info to heuristics
What do you think?
@yarikoptic - i think we should do one pass through all the dicoms, build an index, perform grouping, discard fields as needed. the intent was to move these components from dcmstack to nibabel. it's the group and stack function in dcmstack. we should check if pydicom has introduced any grouping option. this would also allow us to support ge and philips scanners better.
a long time ago, we started this project: https://github.com/ssikka/DICOM-CTP-Anonymizer - i think we should reactivate that piece as well. in general i agree we need to give the user useful options of including/excluding/manipulating the dicom metadata.
regarding auto/edit, we should perhaps rethink those as well. i'm not sure anybody besides me knows what to do with them :(
@yarikoptic - also see this issue #312
FWIW, I am interested to add PatientWeight
to be extracted, and wondered if I should just enable it by default. Looked around openneuro datasets on how many do extract that from dicoms -- a good number:
$> GIT_TERMINAL_PROMPT=0 datalad foreach-dataset --o-s relpath -J10 'git grep -i PatientWeight | head -n 1'
.datalad/metadata/objects/ec/ds-0b4464cc5597f40226e08205843de5:"PatientWeight":
ds001751/sub-01/anat/sub-01_T1w.json: "PatientWeight": 78,
ds001839/sub-5879027/anat/sub-5879027_T1w.json: "PatientWeight": 74,
ds002011/sub-01/func/sub-01_task-Overlap_run-1_bold.json: "PatientWeight": 90.7185,
ds002155/sub-05/anat/sub-05_T1w.json: "CsaSeries.UsedPatientWeight": 70,
ds002647/sub-101/anat/sub-101_T1w.json: "PatientWeight": 56.245461,
ds002674/sub-01/ses-43/anat/sub-01_ses-43_T2star.json: "PatientWeight": 68.0389,
ds003089/sub-01/anat/sub-01_T1w.json:"PatientWeight": 70,
ds003151/sub-170/ses-hormoneabsent/func/sub-170_ses-hormoneabsent_task-nback_run-1_sbref.json: "PatientWeight": 45.3592,
ds003354/derivatives/mriqc/out/sub-01/func/sub-01_task-empatom_bold.json: "PatientWeight": 58,
ds003404/sub-C05/anat/sub-C05_rec-1_T1w.json: "PatientWeight": 70.306826,
ds003507/sub-03/func/sub-03_task-affect_run-1_bold.json: "PatientWeight": 58.967016,
ds003721/sub-03/anat/sub-03_acq-highres_T1w.json: "PatientWeight": 80,
ds003791/sub-01/anat/sub-01_T1w.json: "PatientWeight": 59,
ds003989/sub-01/ses-01/anat/sub-01_ses-01_run-1001_T1w.json: "PatientWeight": 5,
What I am thinking now is to RF --min-meta
into --extra-meta=dcmstack,none,dcmstack-unfiltered
where dcmstack
would be current default, none
-- --min-meta
(deprecate it), and dcmstack-unfiltered
(or alike) would be the one where we would disable that aggressive filtering of dcmstack (although I am yet to check if it is feasible without too much of ad-hoc code). For allowing for custom fields to be extracted per heuristic -- https://github.com/nipy/heudiconv/pull/581 .