itk-wasm icon indicating copy to clipboard operation
itk-wasm copied to clipboard

Consider adding dicom package tests that would utilize selected samples from IDC

Open fedorov opened this issue 1 year ago • 3 comments

Since all of the IDC data is available in public buckets, with the content available via S3 API or HTTPS, without authentication, it might be good to add regression tests that utilize hand-picked DICOM samples that stress specific aspects of the functionality.

Specific examples that we already ran into, with SeriesInstanceUID of a corresponding sample from the current IDC v18 data release:

  • large DICOM SEG for TotalSegmentator: 1.2.276.0.7230010.3.1.3.313263360.35955.1706319184.882151
  • CT with SpacingBetweenSlices = -4: 1.3.6.1.4.1.32722.99.99.239963936032720978832553442140518002510
  • NM with SpacingBetweenSlices = -2: 1.3.6.1.4.1.14519.5.2.1.7009.2403.484725606860278331095617627781

Given the UID above, the corresponding file(s) can be retrieved in just 2 steps:

  1. $ pip install --upgrade idc-index
  2. $ idc download <SeriesInstanceUID>

Other dimensions we may want to consider testing could include various transfer syntaxes, diffusion images from different manufacturers, series with missing slices, series with inconsistent PixelSpacing or ImageOrientationPatient, gantry tilt, presentation states, various samples that contain attributes that are invalid per standard, but may be encountered "in the wild". I think we should be able to find samples for many situations that need to be regression-tested.

I have not done this myself, but looks like CMake supports such external data sources: https://cmake.org/cmake/help/book/mastering-cmake/chapter/Testing%20With%20CMake%20and%20CTest.html#managing-test-data.

I am happy to help with selection of the relevant samples for the tasks we agree should be tested and answer any questions related to IDC.

I think something like the above has been a dream of @pieper for many years now. I believe we finally can make it come true!

fedorov avatar Aug 23 '24 19:08 fedorov

This is where tests are right now and it seems they are propagated from dcmqi: https://github.com/InsightSoftwareConsortium/ITK-Wasm/blob/main/packages/dicom/dcmtk/CMakeLists.txt#L88

fedorov avatar Aug 23 '24 19:08 fedorov

This is where tests are right now and it seems they are propagated from dcmqi: https://github.com/InsightSoftwareConsortium/ITK-Wasm/blob/main/packages/dicom/dcmtk/CMakeLists.txt#L88

@fedorov

The tests in the CMake file are mostly run for sanity check on native binaries. The more comprehensive typescript and python tests are here: https://github.com/InsightSoftwareConsortium/ITK-Wasm/tree/main/packages/dicom/typescript/test https://github.com/InsightSoftwareConsortium/ITK-Wasm/tree/main/packages/dicom/python/itkwasm-dicom-wasi/tests

Also, GDCM is available through both image-io as well as the dicom subpackage for reading image series. I don't believe DCMTK is currently being used for reading imaging modalities of dicom series (@thewtex correct me if I'm wrong).

jadh4v avatar Sep 16 '24 18:09 jadh4v

Yes, I understand. The idea is to augment the existing tests of dcmqi (which are basically small toy examples) with the tests on the real data from IDC, and also add tests of the image-io package using data from IDC. No need to add DCMTK to image-io for this purpose, but just improve testing of the existing GDCM-based functionality.

fedorov avatar Sep 16 '24 18:09 fedorov