PyRIT
PyRIT copied to clipboard
[DRAFT] FEAT: Scorer Evaluations
Description
This PR introduces an evaluation framework for PyRIT Scorers. The goals of this PR are:
- Setting up the process of retrieving baseline metrics for Scorers based on manually generated and (AIRT) human-graded datasets
- Using the metrics for thresholding of Scorer integration tests
- Allowing users to run their own evaluations on Scorers using their own datasets
Some things to focus on for feedback:
- Extensibility of the
ScorerEvalConfigclass andrun_evaluationmethod. We want evaluations to be as easy to implement for other Scorers and scales as possible as we gather new datasets! - How the metrics are retrieved through the
eval_stats_to_jsonScorerclass method - How the metrics and csv of scores are saved (via the
scorer_evals_resultsdirectory right now)
Please note that the full datasets used for generating the official metrics for integration tests (in datasets/score/scorer_evals/metrics) are not included in this PR as they contain harmful content.
Tests and Documentation
Integration test written for Likert Scorer TODO: unit tests for evaluation methods, move implementation of eval for Refusal Scorer into run_evaluation() method, figure out where to put the AIRT-generated datasets