enrich icon indicating copy to clipboard operation
enrich copied to clipboard

Common: support "pii" annotations in schemas for PII Enrichment

Open chuwy opened this issue 5 years ago • 3 comments

PII = Personally Identifiable Information

The basic idea:

  • Any JSON Schema (ue or context) can be annotated with "pii": true on a per-property basis
  • If this PII Scrubber is turned on, then we encrypt any given PII property in any JSON, using AES - so you end up with a unique but non-PII value, e.g. "Fred Blundun" always -> "1de6e53cb23"

This would be of potential interest to users in healthcare or finance, where the ability for analysts to drill down to individual users could be a privacy concern

/cc @yalisassoon @fblundun

chuwy avatar Jun 19 '20 17:06 chuwy

Migrated from https://github.com/snowplow/snowplow/issues/860 (comments are auto-generated)

chuwy avatar Jun 19 '20 17:06 chuwy

I am still a big fan. Only Issue I see is that some of standard schemas in the iglu-central need to be adapted then on a per installation basis and self hosted by the operator.

But having it optionally as an additional thing on own schemas would be helpful.

julianbei avatar Jul 29 '20 19:07 julianbei

Hi, @julianbei and @chuwy. I happen to also be a Snowplow user (🙌) but I came about this thread from a google search for a separate project, basically "how to properly flag PII in JSON Schema?". I haven't found any existing guidance but wanted to ping here in case any progress has been made or other prior-art is known.

I do like "pii": true but another approach I'm considering (again, for a different project) would be something like "classifications": ["pii"], which could in theory also adopt other proprietary or industry-wide labels as classifiers.

aaronsteers avatar Oct 21 '22 16:10 aaronsteers