heudiconv icon indicating copy to clipboard operation
heudiconv copied to clipboard

How to deal with a case that the order of AcquisitionTime is not consistent to SeriesNumber

Open yushiangsu opened this issue 4 years ago • 4 comments

Summary

I have a case that the scanner console froze during the scanning. The console had to reboot and resume scanning protocol. And subject came out and back to scanner. We scanned fieldmap and T2 for new registration and continued functional runs. As consequence, two different images sets with different StudyInstanceUID, but have the same protocol name and SeriesNumber.

The sequence and dicom folder are list below by AcquisitionTime. (The DICOM info here, "StudyID", "StudyInstanceUID", SeriesNumber", "AcquisitionTime" were extracted by OsiriX)

└── dicom
    ├── #assign a StudyInstanceUID: ...0054; StudyID:1
    ├── T1          #SeriesNumber: 3; T1w
    ├── FMAP        #SeriesNumber: 4; magnitude
    ├── FMAP_P      #SeriesNumber: 5; phasediff
    ├── T2          #SeriesNumber: 6; T2w
    ├── F1          #SeriesNumber: 7; Task BOLD run 1
    ├── F2          #SeriesNumber: 9; Task BOLD run 2
    ├── #console reboot 
    ├── #assign a new StudyInstanceUID: ...0001; StudyID:3
    ├── FMAP_F3     #SeriesNumber: 5; magnitude
    ├── FMAP_P_F3   #SeriesNumber: 6; phasediff
    ├── T2_F3       #SeriesNumber: 7; T2w
    ├── F3          #SeriesNumber: 8; Task BOLD run 3
    ├── F4          #SeriesNumber: 10; Task BOLD run 4
    ├── F5          #SeriesNumber: 12; Task BOLD run 5
    ├── F6          #SeriesNumber: 14; Task BOLD run 6
    ├── REST        #SeriesNumber: 16
    ├── T2_CR       #SeriesNumber: 18
    └── DTI         #SeriesNumber: 19

At my first attempt, the heudiconv generate error “Conflicting study identifiers”. The conflicting study id is due to console rebooting. I put a grouping (-g all) option to bypass this error.

However, there are two problems:

  1. The heudiconv misinterpreted two different image sets as an identical image set, even these images are stored in different folders. The output dicomin.tsv showed that the heudiconv recognize the dicom files inside ./FMAP_F3 (2 magnitude images, 76 dicom files) and ./FMAP_P (1 phasdiff image, 38 dicom files) as identical image sets. And filegroup.json clearly showed these two folders were grouping together. I wonder if there is any solution for grouping that could correctly identify these two image set as separate images.
series_id dcm_dir_name series_files
3-MPRAGE_Sag_1mm_iso_g2 T1 192
4-gre_field_mapping FMAP 76
5-gre_field_mapping FMAP_F3 114
6-gre_field_mapping FMAP_P_F3 38
6-t2_tse_tra_256_4mm T2 38
7-ep2d_moco_4mm_240_1-3 F1 240
7-t2_tse_tra_256_4mm T2_F3 38
8-ep2d_moco_4mm_240_1-3 F3 240
9-ep2d_moco_4mm_240_1-3 F2 240
10-ep2d_moco_4mm_240_1-3 F4 240
12-ep2d_moco_4mm_240_1-3 F5 240
14-ep2d_moco_4mm_240_1-3 F6 240
16-ep2d_moco_4mm_240_REST_openeye REST 240
18-t2_tse_tra_256_4mm_CR REST T2_CR 38
19-ep2d_diff_mddw_30_b1000_2.5mm_p2 DTI 62
  1. The console froze before third functional imaging scanning. The first and second run have SeriesNumber “7” and “9”, but the third run have SeriesNumber “8”. If the heuristic file only match for protocol name, i.e. if (‘ep2d_moco_4mm_’ in s.protocol_name), the heudiconv erroneously rename the third run as second run. These can be found in output *.json file as below. Actually, I can include folder name in the heuristic file to prevent this problem. I wonder if there is an option to prioritize AcquisitionTime, rather than SeriesNumber.

*_run-1_bold.json generated by heudiconv

{ ...
  "global": {
    "const": {
      ...
      "SeriesNumber": 7,
      "StudyID": "1",
  }

*_run-2_bold.json generated by heudiconv

{ ...
  "global": {
    "const": {
      ...
      "SeriesNumber": 8,
      "StudyID": "3",
  }

*_run-3_bold.json generated by heudiconv

{ ...
  "global": {
    "const": {
      ...
      "SeriesNumber": 9,
      "StudyID": "1",
  }

*_run-4_bold.json generated by heudiconv

{ ...
  "global": {
    "const": {
      ...
      "SeriesNumber": 10,
      "StudyID": "3",
  }

*_run-5_bold.json generated by heudiconv

{ ...
  "global": {
    "const": {
      ...
      "SeriesNumber": 12,
      "StudyID": "3",
  }

*_run-6_bold.json generated by heudiconv

{ ...
  "global": {
    "const": {
      ...
      "SeriesNumber": 14,
      "StudyID": "3",
  }

Thank you for awesome heudiconv!

Platform details:

Choose one:

  • [ ] Local environment
  • [x] Container: docker nipy/heudiconv:latest
  • Heudiconv version:

heudiconv version 0.9.0

yushiangsu avatar Jan 12 '21 10:01 yushiangsu

If I got it right upon a brief look, behaviors you are describing are pretty much up to heuristic's logic. Which heuristic are you using? e.g. in reproin heuristic, in case of detecting duplicate runs (the same protocol name) we assume that the first one was canceled/rerun and add __dup suffix to signal that (for inspection and possible removal)

yarikoptic avatar Jan 15 '21 13:01 yarikoptic

Thanks for your concerns! I followed the tutorial from Stanford Center for Reproducible Neuroscience. At beginning, I ran heudiconv with the flag -f convertall and -c none -b -g all to create dicominfo.tsv. This dicominfo.tsv table showed in my first comment. I would say the heuristic "convertall" failed to recognize two image sets, which have different StudyID but have the same SeriesNumber. Then, I revised the customized heuristic.py and ran heudiconv with -f heuristic.py and -c dcm2niix -b -g all. The heuristic.py did include s.dcm_dir_name but still cannot recognize two image sets from different folders. As the phasediff image is not recognized, it did not output the phasediff image. And the customized heuristic.py probably take SeriesNumber to sort the order for converting, regardless of different StudyID. As consequence, the run-3 ("SeriesNumber": 8, "StudyID": "3") was erroneously named as run-2, and run-2 ("SeriesNumber": 9, "StudyID": "1") was named as run-3. This is output sub_*_scans.tsv:

filename acq_time operator randstr
anat/sub-s526_T1w.nii.gz 2020-11-19T15:31:35.492500 n/a b647c4ac
fmap/sub-s526_magnitude1.nii.gz 2020-11-19T15:38:47.517500 n/a 81186085
fmap/sub-s526_magnitude2.nii.gz 2020-11-19T15:38:47.517500 n/a 81186085
anat/sub-s526_acq-4task_T2w.nii.gz 2020-11-19T15:40:12.545000 n/a 463ce887
func/sub-s526_task-dbb_run-1_bold.nii.gz 2020-11-19T15:51:35.720000 n/a 0e5129d8
func/sub-s526_task-dbb_run-3_bold.nii.gz 2020-11-19T16:00:42.190000 n/a 0e889ecf
func/sub-s526_task-dbb_run-2_bold.nii.gz 2020-11-19T16:53:49.012500 n/a a8cfec10
func/sub-s526_task-dbb_run-4_bold.nii.gz 2020-11-19T17:02:19.985000 n/a 04b75f28
func/sub-s526_task-dbb_run-5_bold.nii.gz 2020-11-19T17:11:12.040000 n/a 248634c4
func/sub-s526_task-dbb_run-6_bold.nii.gz 2020-11-19T17:19:50.150000 n/a 6a992f9d
func/sub-s526_task-rest_bold.nii.gz 2020-11-19T17:30:30.200000 n/a 64eebeaf
anat/sub-s526_acq-4rest_T2w.nii.gz 2020-11-19T17:38:42.455000 n/a 785f9422
dwi/sub-s526_dwi.nii.gz 2020-11-19T17:42:26.377500 n/a a1c1823b

Are these possible to solved by dedicated heuristic file? Let me know the key element in reproin, I would check and try on my case. I appreciate your effort on this. Thanks!


The completed code in my customized heuristic.py:

import os
def create_key(template, outtype=('nii.gz',), annotation_classes=None):
    if template is None or not template:
        raise ValueError('Template must be a valid format string')
    return template, outtype, annotation_classes
def infotodict(seqinfo):
    t1w = create_key('sub-{subject}/anat/sub-{subject}_T1w')
    t2w4task = create_key('sub-{subject}/anat/sub-{subject}_acq-4task_T2w')
    t2w4rest = create_key('sub-{subject}/anat/sub-{subject}_acq-4rest_T2w')
    taskepi = create_key('sub-{subject}/func/sub-{subject}_task-dbb_run-{item:01d}_bold')
    restepi = create_key('sub-{subject}/func/sub-{subject}_task-rest_bold')
    fmap_m = create_key('sub-{subject}/fmap/sub-{subject}_magnitude')
    fmap_p = create_key('sub-{subject}/fmap/sub-{subject}_phasediff')
    dwi = create_key('sub-{subject}/dwi/sub-{subject}_dwi')

    info = {
                t1w: [],
                t2w4task: [],
                t2w4rest: [],
                taskepi: [],
                restepi: [],
                fmap_m:[],
                fmap_p:[],
                dwi:[]
            }
    last_run = len(seqinfo)

    for s in seqinfo:
        if ('MPRAGE' in s.protocol_name) and (s.dim3 == 192) and ('T1' == s.dcm_dir_name):
            info[t1w].append(s.series_id)
        if ('gre_field_mapping' in s.protocol_name) and (s.dim3 == 76) and ('FMAP' == s.dcm_dir_name):
            info[fmap_m].append(s.series_id)
        if ('gre_field_mapping' in s.protocol_name) and (s.dim3 == 38) and ('FMAP_P' == s.dcm_dir_name):
            info[fmap_p].append(s.series_id)
        if ('t2_tse_tra' in s.protocol_name) and (s.dim3 == 38) and ('T2' == s.dcm_dir_name):
            info[t2w4task].append(s.series_id)
        if ('t2_tse_tra' in s.protocol_name) and (s.dim3 == 38) and ('T2_CR' == s.dcm_dir_name):
            info[t2w4rest].append(s.series_id)
        if ('ep2d_moco_4mm_' in s.protocol_name) and (s.dim1 == 64) and (s.dim2 == 64) and ('F' in s.dcm_dir_name):
            info[taskepi].append(s.series_id)
        if ('ep2d_moco_4mm_' in s.protocol_name) and ('REST' in s.protocol_name) and (s.dim4 == 240):
            info[restepi].append(s.series_id)
        if ('ep2d_diff_mddw' in s.protocol_name) and (s.dim1 == 78) and (s.dim2 == 78):
            info[dwi].append(s.series_id)
    return info

yushiangsu avatar Jan 21 '21 12:01 yushiangsu

Hi @yushiangsu,

Regarding the order of the runs, you can check what we do in our custom heuristic file to sort by acquisition date and time, precisely for situations like this (now that AcquisitionTime is stored in the seqinfo #487).

Then, you can add a call to sort_seqinfo_by_series_date_time before looping through seqinfo, like here:

    last_run = len(seqinfo)

    # because series are not always sorted by acquisition date/time, sort them:
    seqinfo = sort_seqinfo_by_series_date_time(seqinfo)

    for s in seqinfo:
[...]

@yarikoptic , If you think doing this sorting would be useful in general, I could put it somewhere in the main code (not in custom heuristics) so that the seqinfo comes sorted by date/time by default.

pvelasco avatar Jan 21 '21 15:01 pvelasco

Sounds sensible but first needs a check on either we already do not do some sorting which might "contradict" sorting by AcquisitionTime (SeriesNumber?)... also acquisition time sorting alone would not be sufficient in light of #487 discussion, so should be combined with SeriesNumber

yarikoptic avatar Jan 21 '21 16:01 yarikoptic