presidio
presidio copied to clipboard
Add ability to auto-detect which DICOM files to process
Is your feature request related to a problem? Please describe.
When working with DICOM image datasets, the number of images that have text PHI burnt into them are usually fairly small. It would be great to be able to feed a whole directory into the DicomImageRedactorEngine
and automatically only have the files with text PHI processed.
Describe the solution you'd like
It would be ideal to have an additional argument in DicomImageRedactorEngine.redact_from_directory
that allows the user to specify if they want to process every DICOM image or only those automatically identified as having text PHI in the image.
Describe alternatives you've considered
An alternative approach is to allow users to feed in a csv file or table into DicomImageRedactorEngine.redact_from_directory
that specifies which files to process.
May not be required. We can just process everything and ensure the OCR threshold is set properly such that the number of false positives is low. We should then return a list of which files had boxes drawn on them.