piicatcher icon indicating copy to clipboard operation
piicatcher copied to clipboard

Shallow scan should recognize phone, credit card, person and location from column names

Open vrajat opened this issue 6 years ago • 1 comments

It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.

W.R.T the output in particular, my observations are:

  1. Shallow scan should recognize phone, credit card, person and location from column names
  2. Deep scan did not recognize PII in a few columns. I need to look at the data to figure out if thats a bug or the column did not have any relevant data.
  3. Deep scan should also scan column names for candidates
  4. Along with an array, PIICatcher should add confidence numbers.

Originally posted by @vrajat in https://github.com/tokern/piicatcher/issues/67#issuecomment-586078802

vrajat avatar Feb 14 '20 03:02 vrajat

Add birthdate.

vrajat avatar Jul 20 '20 10:07 vrajat