heudiconv icon indicating copy to clipboard operation
heudiconv copied to clipboard

A way to add additional DICOM fields, which might have been filtered out by dcmstack

Open yarikoptic opened this issue 6 years ago • 3 comments

ATM we use dcmstack to extract additional (when not --min-meta) metadata from DICOMs. One of the pains is that it takes long time, probably partially because if I read https://github.com/nipy/heudiconv/blob/master/heudiconv/dicoms.py#L405) correctly we actually pretty much rebuild the entire nifti data as well along the way, even if we care only about the metadata.

But also dcmstack has its own idea on what DICOM fields to include/exclude (see https://github.com/moloney/dcmstack/blob/f6631278b3558677a9cc94b0b0ba362decb6efb6/src/dcmstack/dcmstack.py#L63) so e.g. I do not have access anyhow to the PatientWeight field even if I decide that I need it.

Ideally, I think, heuristics should be able to instruct metadata extraction on which fields to extract (or exclude) in addition to the ones dcmstack extracts, and we should not do it in yet another loop over dicoms. ATM we already first spend time extracting seqinfo records, and then doing that metadata extraction using dcmstack.

I am yet not sure how we should do it BUT may be we could/should add "custom_meta" field to SeqInfo which would contain the FrozenDict of such key/values. Pros:

  • no additional runtime penalty (everything is there)
  • then if we allow heuristics to specify custom fields - they could accomplish whatever custom they want based on the dicoms (not limited to only those we provide). Cons:
  • not sure if it wouldn't affect negatively those .auto/.edit files we create for manual renaming
  • we would need one more configuration setting/heuristic config to actually say either those should be stored in the sidecar .json files. So may be then it should be done at the point of metadata extraction through somehow piggibacking on dcmstack to avoid it filtering desired fields out, and thus keep it limited only to the metadata extraction and not providing additional info to heuristics

What do you think?

yarikoptic avatar Feb 13 '19 21:02 yarikoptic

@yarikoptic - i think we should do one pass through all the dicoms, build an index, perform grouping, discard fields as needed. the intent was to move these components from dcmstack to nibabel. it's the group and stack function in dcmstack. we should check if pydicom has introduced any grouping option. this would also allow us to support ge and philips scanners better.

a long time ago, we started this project: https://github.com/ssikka/DICOM-CTP-Anonymizer - i think we should reactivate that piece as well. in general i agree we need to give the user useful options of including/excluding/manipulating the dicom metadata.

regarding auto/edit, we should perhaps rethink those as well. i'm not sure anybody besides me knows what to do with them :(

satra avatar Feb 13 '19 22:02 satra

@yarikoptic - also see this issue #312

satra avatar Apr 17 '19 14:04 satra

FWIW, I am interested to add PatientWeight to be extracted, and wondered if I should just enable it by default. Looked around openneuro datasets on how many do extract that from dicoms -- a good number:

$> GIT_TERMINAL_PROMPT=0 datalad foreach-dataset --o-s relpath -J10 'git grep -i PatientWeight | head -n 1'
.datalad/metadata/objects/ec/ds-0b4464cc5597f40226e08205843de5:"PatientWeight":                                                                                                                                      
ds001751/sub-01/anat/sub-01_T1w.json:      "PatientWeight": 78,                                                                                                                                                      
ds001839/sub-5879027/anat/sub-5879027_T1w.json: "PatientWeight": 74,                                                                                                                                                 
ds002011/sub-01/func/sub-01_task-Overlap_run-1_bold.json:       "PatientWeight": 90.7185,                                                                                                                            
ds002155/sub-05/anat/sub-05_T1w.json:      "CsaSeries.UsedPatientWeight": 70,                                                                                                                                        
ds002647/sub-101/anat/sub-101_T1w.json: "PatientWeight": 56.245461,                                                                                                                                                  
ds002674/sub-01/ses-43/anat/sub-01_ses-43_T2star.json:  "PatientWeight": 68.0389,                                                                                                                                    
ds003089/sub-01/anat/sub-01_T1w.json:"PatientWeight": 70,                                                                                                                                                            
ds003151/sub-170/ses-hormoneabsent/func/sub-170_ses-hormoneabsent_task-nback_run-1_sbref.json:    "PatientWeight": 45.3592,                                                                                          
ds003354/derivatives/mriqc/out/sub-01/func/sub-01_task-empatom_bold.json:    "PatientWeight": 58,                                                                                                                    
ds003404/sub-C05/anat/sub-C05_rec-1_T1w.json:   "PatientWeight": 70.306826,                                                                                                                                          
ds003507/sub-03/func/sub-03_task-affect_run-1_bold.json:        "PatientWeight": 58.967016,                                                                                                                          
ds003721/sub-03/anat/sub-03_acq-highres_T1w.json:       "PatientWeight": 80,                                                                                                                                         
ds003791/sub-01/anat/sub-01_T1w.json:   "PatientWeight": 59,                                                                                                                                                         
ds003989/sub-01/ses-01/anat/sub-01_ses-01_run-1001_T1w.json:    "PatientWeight": 5, 

What I am thinking now is to RF --min-meta into --extra-meta=dcmstack,none,dcmstack-unfiltered where dcmstack would be current default, none -- --min-meta (deprecate it), and dcmstack-unfiltered (or alike) would be the one where we would disable that aggressive filtering of dcmstack (although I am yet to check if it is feasible without too much of ad-hoc code). For allowing for custom fields to be extracted per heuristic -- https://github.com/nipy/heudiconv/pull/581 .

yarikoptic avatar Dec 07 '22 13:12 yarikoptic