Improved Pii Scrubbing (for Attributes/span.data) for fields like LLM requests/responses
Currently the way PII scrubbing is configured is split into 3 modes: true, maybe, false. Where false never runs PII scrubbing, maybe only if the user specifically opted into it for that field, and true if PII scrubbing is enabled.
When PII scrubbing is enabled, it runs all of our rules, some of these rules are very destructive and replace the entire contents of a value. For example if we detect the string password in a text, we delete the entire text.
Some of our products, like the AI monitoring, really really need the content of certain fields to be useful. So far it has only been possible to set these fields to maybe and by default not scrub them. This is explicitly documented in our product documentation, but it is a potential foot-gun and bad user experience.
To make this better, we want to be able to specify that only certain groups of rules should apply to certain fields. For example, we want to only apply rules which replace values, but not the entire content on LLM requests/responses.
- Investigate the implementation necessary to make it possible to select rules/ignore rules when PII scrubbing on a field by field basis.
- Ideally for Spans/Logs, grab these rules or make them configurable, from/via sentry-conventions.
- Gauge how much effort this is to implement on attribute based schemas (Span V2, Logs) and Span Data. Depending on effort, we may only want to implement it for attribute based types.