presidio icon indicating copy to clipboard operation
presidio copied to clipboard

DICOM verify engine: remove duplicates by score, all PHIs are PERSONs

Open SharonHart opened this issue 2 years ago • 2 comments

A few bugs in DICOM verify engine now causing test_dicom_image_pii_verify_engine_integration.py tests fail:

  1. When we remove duplicates - we take the first element regardless of the score - code pointer After fixing it to take the higher score it now took a PERSON entity with value '16' and score 1.0 over a real PERSON entity from spacy with score 0.85.
  2. How '16' was identifies as PERSON? another bug in which we treat the DICOM metadata as PHI and add each element to a deny list with PERSON as the entity.

But why it is failing now??? probably spacy in its latest version started finding more PERSON entities that are sometimes overridden and sometimes not when removing duplicates.

@omri374 @niwilso

Tests were skipped in https://github.com/microsoft/presidio/pull/1032

SharonHart avatar Feb 21 '23 20:02 SharonHart

@SharonHart can this be closed or not yet?

omri374 avatar Mar 08 '23 07:03 omri374

@SharonHart can this be closed or not yet?

We are still tagging DICOM metadata as PERSON.

SharonHart avatar Mar 08 '23 07:03 SharonHart