dicom-numpy icon indicating copy to clipboard operation
dicom-numpy copied to clipboard

ENH proposals from contrib-pydicom: read a folder, use proper dtype for rescaled images

Open jond01 opened this issue 3 years ago • 1 comments

Hi, I have recently reviewed the contrib-pydicom code. The script input-output/pydicom_series.py there has (more or less) the same purpose as dicom-numpy. I noticed there two points that we may want to include also here:

  1. Read a folder: add an API to read a folder and extract the image array and affine. A similar idea is in the example from the docs, which receives a list of files:

    import pydicom
    import dicom_numpy
    
    def extract_voxel_data(list_of_dicom_files):
        datasets = [pydicom.dcmread(f) for f in list_of_dicom_files]
        try:
            voxel_ndarray, ijk_to_xyz = dicom_numpy.combine_slices(datasets)
        except dicom_numpy.DicomImportException as e:
            # invalid DICOM data
            raise
        return voxel_ndarray
    

    We may go one level up, and receive only the path of the folder containing these files. The files within the folder can be filtered to only dicoms with pydicom's built-in is_dicom function, and further split into distinct series (according to the SeriesInstanceUID). combine_slices will be called for each series, and the returned data will be a list of [(voxels0, affine0), (voxels1, affine1), ...].

  2. Tighten the dtype of rescaled images: currently, dicom-numpy uses np.float32 every time there is a RescaleSlope or RescaleIntercept: https://github.com/innolitics/dicom-numpy/blob/204e95594a527bbab1444f9248432ffa01af024c/dicom_numpy/combine_slices.py#L108 However, many times it is not necessary - for example, if the rescale slope is 1.0 and the rescale intercept is -1024.0, the image can still be of an integer dtype. contrib-pydicom seems to have some clever way to determine the proper dtype.

jond01 avatar May 05 '21 14:05 jond01

@jond01 thank you for the thoughtful suggestions. I agree that (1) would be a useful function to add.

I also agree that we don't want to force users to convert to np.float32 if they don't want to. I think it would be nice for function users to be able to assume the output of combine_slices has a consistent dtype, regardless of the input. Thus, I don't think the default behavior should be able to vary between dtypes dynamically. I'm sure there is a way to accommodate both requirements though.

johndgiese avatar May 07 '21 17:05 johndgiese