presidio icon indicating copy to clipboard operation
presidio copied to clipboard

Detecting :: as IPv6 Address

Open troy256 opened this issue 1 year ago • 5 comments

Describe the bug We are indirectly using this library as part of PII detection for text coming from a GenAI based coding assistant. However it is detecting every instance of "::" as PII, because IPv6 addresses can contain this. This string is regularly used in Perl, as well as C++ and PHP. E.g. -

use strict;
use warnings;
use LWP::UserAgent; <-- detected as PII

Expected behavior Ideally the IPv6 detection would be smart enough to know the difference between programming language use vs an actual IPv6 address.

Additional context Very similar to Issue #907

troy256 avatar Jun 04 '24 12:06 troy256

:: is a valid ipv6 address The solution might be to split the regex into two and drop the score for :: Anyone up for fixing it?

SharonHart avatar Jun 20 '24 07:06 SharonHart

Even though :: is a valid IPv6 address, it's not personally identifiable and is effectively anonymous. So maybe skip over it?

troy256 avatar Jun 20 '24 12:06 troy256

A simple workaround would be to add :: as an allow_list term.

omri374 avatar Jun 20 '24 12:06 omri374

Can that be done with configuration or is that a code change?

troy256 avatar Jun 26 '24 20:06 troy256

Configuration: https://microsoft.github.io/presidio/tutorial/13_allow_list/

omri374 avatar Jun 27 '24 03:06 omri374