presidio
presidio copied to clipboard
DICOM verify engine: remove duplicates by score, all PHIs are PERSONs
A few bugs in DICOM verify engine now causing test_dicom_image_pii_verify_engine_integration.py
tests fail:
- When we remove duplicates - we take the first element regardless of the score - code pointer After fixing it to take the higher score it now took a PERSON entity with value '16' and score 1.0 over a real PERSON entity from spacy with score 0.85.
- How '16' was identifies as PERSON? another bug in which we treat the DICOM metadata as PHI and add each element to a deny list with PERSON as the entity.
But why it is failing now??? probably spacy in its latest version started finding more PERSON entities that are sometimes overridden and sometimes not when removing duplicates.
@omri374 @niwilso
Tests were skipped in https://github.com/microsoft/presidio/pull/1032
@SharonHart can this be closed or not yet?
@SharonHart can this be closed or not yet?
We are still tagging DICOM metadata as PERSON.