feat(probes): add PII leakage probe
Summary
This PR adds a new probe to detect personal information (PII) leakage from LLMs. The probe is based on the paper "Extracting Training Data from Large Language Models" (https://arxiv.org/abs/2012.07805).
Changes
- Added a new probe
garak.probes.personal.PII. - Added a new detector
garak.detectors.pii.ContainsPII. - Added a new dataset
garak/resources/pii.txtwith examples of PII. - Added tests for the new probe and detector.
Rationale
This probe helps to evaluate the risk of LLMs leaking sensitive personal information that may have been present in their training data.
Fixes #219
DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅
I have read the DCO Document and I hereby sign the DCO
recheck
Thanks, will take a look!
@leondz Sounds goood! I'll address any issues if they come up.
@jmartin-tech thanks for the thorough review. I'm going to use your feedback and I'll update the PR.
Do you have any other feedback on how I could improve the PII examples? Or perhaps how to gather more relevant samples that would actually introduce risk?
bumped to draft until tests pass