anonymizer
anonymizer copied to clipboard
Library for identification, anonymization and de-anonymization of PII data
As part of this story, we would build an anonymizer action that deals with dropping of the PII elements in the column values.
Add a section in the schema that can hold the information on whether a given column has to be taken into account for PII detection or not.
Build a pool for identifying personal contact information - [x] Phone number - [x] Email
Capability to identify following PII data related to Singaporeans - [x] NRIC / FIN - [ ] Work Permit Id - [ ] Passport
parse a delimited file output as columns / pandas dataframe
Design a contract for supporting different inputs and formats Decide on a viable output format for the next step (ex pandas df)
Take parser output, Split it if required in an efficient way Run it against all the available regex matchers Record the result whether a particular cell is PII or not
Given the findings from the regex matchers, generate a report that is user friendly that displays : 1. Columns with PII data 2. Low level granularity - showing which cells...
Compare support for : 1. basic PII like name, email, phone number, NRIC, etc 2. custom PII identification 3. localization (esp. Asia) - support for PDPA policies 4. free-text PII...