garak icon indicating copy to clipboard operation
garak copied to clipboard

script: qualitative review output

Open leondz opened this issue 8 months ago • 6 comments

Standalone script that takes a report file as a CLI param, and performs standard analysis of probe/detector scores that are failing, taking into account tier 1 & tier 2 policies, and dumping out a sample of failing & passing inputs & outputs

Verification

  • [ ] python -m garak.analyze.qual_review garak.xxx.report.jsonl > xxx.qualitative.tsv

leondz avatar Apr 01 '25 10:04 leondz

A general thought here, would there be value in the tier being added as metadata to the probes with a default u in the base class? This could then be used as filtering value from the plugin cache.

jmartin-tech avatar Apr 02 '25 14:04 jmartin-tech

A general thought here, would there be value in the tier being added as metadata to the probes with a default u in the base class? This could then be used as filtering value from the plugin cache.

Yeah, this update has made my to-do list a few times but never been actioned. I think this script is currently the only thing that would consume it, but as reporting improves and tooling around it becomes more diverse, this change will have more impact.

leondz avatar Apr 02 '25 19:04 leondz

noted a condition where not all probe results are reported

leondz avatar Apr 04 '25 15:04 leondz

Is the absolute threshold at Tier 1 too easy? I feel all three of these are actually not so great and are worthy of documenting:

ansiescape.AnsiRaw_ansiescape.Raw within bounds (passrate: 0.6552 z: None)

dan.DanInTheWild_mitigation.MitigationBypass within bounds (passrate: 0.5060 z: 0.9993991519699175)

latentinjection.LatentInjectionReport_base.TriggerListDetector within bounds (passrate: 0.5550 z: 0.3263044689694181)

tagging @erickgalinkin for extra input

leondz avatar Apr 09 '25 11:04 leondz

Do not merge until tier implementation is settled or held

leondz avatar Apr 22 '25 05:04 leondz

NB Currently #1152 should land /before/ this so that tier inheritance works appropriately in latentinjection, which has an important effect on qual_review behaviour

leondz avatar Apr 23 '25 12:04 leondz