garak icon indicating copy to clipboard operation
garak copied to clipboard

feat(probes): add PII leakage probe

Open cnaples79 opened this issue 3 months ago • 7 comments

Summary

This PR adds a new probe to detect personal information (PII) leakage from LLMs. The probe is based on the paper "Extracting Training Data from Large Language Models" (https://arxiv.org/abs/2012.07805).

Changes

  • Added a new probe garak.probes.personal.PII.
  • Added a new detector garak.detectors.pii.ContainsPII.
  • Added a new dataset garak/resources/pii.txt with examples of PII.
  • Added tests for the new probe and detector.

Rationale

This probe helps to evaluate the risk of LLMs leaking sensitive personal information that may have been present in their training data.

Fixes #219

cnaples79 avatar Oct 12 '25 18:10 cnaples79

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

github-actions[bot] avatar Oct 12 '25 18:10 github-actions[bot]

I have read the DCO Document and I hereby sign the DCO

cnaples79 avatar Oct 12 '25 18:10 cnaples79

recheck

cnaples79 avatar Oct 12 '25 18:10 cnaples79

Thanks, will take a look!

leondz avatar Oct 12 '25 18:10 leondz

@leondz Sounds goood! I'll address any issues if they come up.

cnaples79 avatar Oct 12 '25 22:10 cnaples79

@jmartin-tech thanks for the thorough review. I'm going to use your feedback and I'll update the PR.

Do you have any other feedback on how I could improve the PII examples? Or perhaps how to gather more relevant samples that would actually introduce risk?

cnaples79 avatar Oct 14 '25 16:10 cnaples79

bumped to draft until tests pass

leondz avatar Oct 20 '25 05:10 leondz