piiregex
piiregex copied to clipboard
Search for PII in Python
trafficstars
PiiRegex 
This wouldn't have been possible without CommonRegex. Thanks!
Attempt to find PII in regex either using a specific PII type, or search through everything available.
Pull requests welcome!
Install via pip.
pip install piiregex
Tests are available through pytest.
pip install -r dev_requirements.text
pytest -vv
Usage
>>> from piiregex import PiiRegex
>>> parsed_text = PiiRegex("""John, please get that article on www.linkedin.com to me by 5:00PM
on Jan 9th 2012. 4:00 would be ideal, actually. If you have any
questions, You can reach me at (519)-236-2723x341 or get in touch with
my associate at [email protected]""")
>>> parsed_text.times
['5:00PM', '4:00']
>>> parsed_text.dates
['Jan 9th 2012']
>>> parsed_text.phones
['(519)-236-2727']
>>> parsed_text.phones_with_exts
['(519)-236-2723x341']
>>> parsed_text.emails
['[email protected]']
Alternatively, you can generate a single PiiRegex instance and use it to parse multiple segments of text.
>>> parser = PiiRegex()
>>> parser.times("When are you free? Do you want to meet up for coffee at 4:00?")
['4:00']
Finally, all regular expressions used are publicly exposed.
>>> from piiregex import email
>>> import re
>>> text = "...get in touch with my associate at [email protected]"
>>> re.sub(email, "[email protected]", text)
'...get in touch with my associate at [email protected]'
>>> from piiregex import time
>>> for m in time.finditer("Does 6:00 or 7:00 work better?"):
>>> print(m.start(), m.group())
5 6:00
13 7:00
Most importantly (for our use case) any_match iterates through all regexes to match anything.
>>> from piiregex import PiiRegex
>>> parsed_text = PiiRegex("07123 123123") # should match a UK phone number.
>>> parsed_text.any_match()
True
Please note that this module is currently English/US and UK specific. Due to the European nature of GDPR though this is being expanded. PRs are welcome.
Supported Methods/Attributes
obj.dates,obj.dates()obj.times,obj.times()obj.phones,obj.phones()obj.phones_with_exts,obj.phones_with_exts()obj.emails,obj.emails()obj.ips,obj.ips()obj.ipv6s,obj.ipv6s()obj.credit_cards,obj.credit_cards()obj.btc_addresses,obj.btc_addresses()obj.street_addresses,obj.street_addresses()obj.postcodes,obj.postcodes()obj.ukphones,obj.ukphones()