presidio
presidio copied to clipboard
Inpaint instead of using solid color for bounding boxes
Is your feature request related to a problem? Please describe. Currently, every redactor box drawn is a solid color. The color is set to either contrast with or blend in with the most prominent pixel value (e.g., color) in the image.
When text is present over a fairly empty background, the "background" fill solid color blends in naturally and does not drastically alter the pixel value distribution of the image. However, if the text is present over an area of interest, replacing all the pixels there with a single value does impact the pixel value distribution of the image and may not be ideal for downstream image processing / ML.
Describe the solution you'd like Instead of replacing all pixels within a bounding box with a solid color, instead fill with an approximation of how the region would look without text.
Describe alternatives you've considered n/a
Additional context This may be a little more tricky than just replacing pixels in an image because we will need to do this at the DICOM pixel array level (which does not always adhere to standard pixel value arrays in a standard image). If we use an approach that requires using pixel values from a standard image representation (e.g., png), then we may need to make a helper function to translate those pixel values back to the expected scale in the DICOM pixel array.
Potential solutions to look into:
- Inpainting for medical images: https://deepai.org/publication/shape-aware-masking-for-inpainting-in-medical-imaging
- OpenCV inpaint: https://datascience.stackexchange.com/questions/51375/how-to-replace-nan-values-for-image-data
- OpenCV inpainting doc: https://docs.opencv.org/4.x/df/d3d/tutorial_py_inpainting.html
- SciKit-Learn: https://scikit-image.org/docs/stable/auto_examples/filters/plot_inpaint.html
- Python library: https://github.com/aGIToz/PyInpaint
- Python library: https://github.com/spaceml-org/Missing-Pixel-Filler
- Simple interpolation: https://stackoverflow.com/questions/37662180/interpolate-missing-values-2d-python
There is also the possibility of creating our own custom-trained model as well (e.g., creating a dataset of dicom images with and without text layered on).