monai-deploy-app-sdk
monai-deploy-app-sdk copied to clipboard
DICOMSeriesSelectorOperator Enhancements
Addressing some points raised in Issue #204.
Enhanced Matching for Numerical DICOM Tags
In addition to the previously supported exact matching, the following is now supported for numerical (i.e. integer and float type DICOM VRs, including IS and DS VRs) tags:
- Range matching: when a list with two numerical values is provided, these values will serve as lower and upper bounds; match will be successful if DICOM tag value is between (inclusive) of the bounds. This was mainly tested for the SliceThickness tag.
- RegEx matching: when a RegEx is provided, numerical DICOM tag value will be converted to a string; full match used provide more specificity (Note: RegEx matching was added as more of a proof of concept; this may not be utilized much in practice, but added it in this PR just to document. If this is not desired, I would be fine with removing).
I have also updated the example sample selection rules to include a 3rd selection ("CT Series 3"); this new selection includes an example of union matching for set type (functionality already present, but an example was not previously provided), as well as an example of the newly-added numerical range matching:
"name": "CT Series 3",
"conditions": {
"StudyDescription": "(.*?)",
"Modality": "(?i)CT",
"ImageType": ["PRIMARY", "ORIGINAL", "AXIAL"],
"SliceThickness": [3, 5]
}
Sort by SOP Instance Count
In reviewing ~30 CT A/P scans from our hospital, filtering by StudyDescription, Modality, ImageType, and SliceThickness narrows down to a single, desired series in about 85% of cases (CT, Axial, 3-5 mm ST). For the remaining 15%, it is often the case that, of the series that pass the initial filtering, the series that is desired for inference is the one with the most number of DICOM images present (i.e. highest number of SOP instances). This is often due to the filtered series with more DICOM images having a higher likelihood of having the complete A/P anatomy present, as well as the presence of short series (e.g. Smart Prep Series) which technically pass the filters but are not desired for inference.
With this observation, I have implemented a new default parameter, sort_by_sop_instance_count, which, when True, and when multiple series pass the selection criteria (all_matched=True), will sort the selected series in descending SOP instance count, i.e. will list the filtered DICOM series with the most DICOM images first. This will give a higher probability that, despite downstream operators (the MONAIBundleInferenceOperator, I believe) being limited to only performing inference on the first selected series even if multiple series are selected, inference is performed on the desired series. sort_by_sop_instance_count is implemented in a very similar way to all_matched - as a default parameter with a default value of False.
@vikashg @MMelQin - I appreciate your initial review of this PR, Elan provided me with your feedback earlier this week.
Regarding your comment about sorting and using ImagePositionPatient instead of SOP instances, I apologize as perhaps it's not clear from my explanation what I am trying to accomplish with this functionality. I am not trying to sort the slices within the selected series - I am trying to sort the series themselves.
For example, say all_matched=True, and we have 3 series that pass the selection criteria. Say these series have 29, 8, and 58 DICOM images each, and are matched in this order sequentially. Unsorted, the StudySelectedSeries list outputted would have an order of [29, 8, 58]; however, when sorted, the list would have an order of [58, 29, 8], i.e. placing the DICOM series with the highest number of DICOM images in the 0th position. Because of how downstream operators are currently configured, ultimately only a single DICOM SEG will be written for the "first" (0th) selected series in the list, even if multiple series are selected. Please refer to my original PR comment for why selecting the series with the higher number of DICOM images (if multiple series meet the criteria) could be desirable.
My code is using the get_sop_instances() method to grab the number of DICOM images present in the series, not to sort the DICOM series. I hope this helps to clarify, please let me know if you have any additional questions. I realize this feature may not be desired, or that it may be advisable to change the name of this parameter / add additional documentation to make this functionality clear.
In my latest commit, I also made a change to the STL conversion operator to address a failed pytype check:
Checking styles using pytype...
homebluna301monai-deploy-app-sdkmonaideployutilsimportutil.py20 DeprecationWarning pkg_resources is deprecated as an API. See httpssetuptools.pypa.ioenlatestpkg_resources.html
import pkg_resources
2024-09-06 163710 $ python3 -m pytype --version
2024.04.11
2024-09-06 163711 $ python3 -m pytype -j 16 --python-version=3.10
Computing dependencies
Analyzing 54 sources with 1 local dependencies
ninja Entering directory `.pytype'
[56] check monai.deploy.operators.stl_conversion_operator
FAILED homebluna301monai-deploy-app-sdk.pytypepyimonaideployoperatorsstl_conversion_operator.pyi
usrbinpython3 -m pytype.main --disable pyi-error,container-type-mismatch,attribute-error --imports_info homebluna301monai-deploy-app-sdk.pytypeimportsmonai.deploy.operators.stl_conversion_operator.imports --module-name monai.deploy.operators.stl_conversion_operator --platform linux -V 3.10 -o homebluna301monai-deploy-app-sdk.pytypepyimonaideployoperatorsstl_conversion_operator.pyi --analyze-annotated --nofail --quick homebluna301monai-deploy-app-sdkmonaideployoperatorsstl_conversion_operator.py
File homebluna301monai-deploy-app-sdkmonaideployoperatorsstl_conversion_operator.py, line 247, in convert Name 'temp_folder' is not defined [name-error]
For more details, see httpsgoogle.github.iopytypeerrors.html#name-error
ninja build stopped cannot make progress due to previous errors.
Leaving directory '.pytype'
pytype check failed!
I moved the declaration of temp_folder to outside of the try block to resolve this.
Thanks @bluna301. I am looking at the code. Give me one more day and I will be able to comment on your PR.
Thanks @bluna301. I am looking at the code. Give me one more day and I will be able to comment on your PR.
Thanks @vikashg !
Hey @bluna301, I understand the logic now and it makes sense too. Sometimes, I have also faced this issue and generally the series with highest number of dicoms is the right series. I went through the code, and I will suggest that you should throw a warning or a log that you are choosing the series with highest number of dicom images as multiple series are matching the selection criteria. This will be helpful for the user to know what is going on. Other than that I think it looks good. I have not run your code though. @MMelQin thoughts ?
@vikashg Thank you for your review! I agree with your suggestion, it would be good to make it more explicit to the user what is going on during when multiple series are matched. I added an info log for when sorting occurs, and also added some additional detail in the documentation of this operator to make it explicit that enabling this feature will "return" (i.e. sort and place first in the returned List) the series with the highest # of DICOM images.
I have tested the new series selection functionality locally, as well as the minor change to the STL operator, with no issues. Please let me know if you or @MMelQin would like to test out the functionality before merging the PR.
Quality Gate passed
Issues
2 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@bluna301 Thanks for the PR. Just acknowledging it will review this week.
Thanks @vikashg. Nothing major in this commit, just adding some logging to make it explicit which series has been selected.