[DRAFT] FEAT: Scorer Evaluations

Open jsong468 opened this issue 7 months ago • 0 comments

Description

This PR introduces an evaluation framework for PyRIT Scorers. The goals of this PR are:

Setting up the process of retrieving baseline metrics for Scorers based on manually generated and (AIRT) human-graded datasets
Using the metrics for thresholding of Scorer integration tests
Allowing users to run their own evaluations on Scorers using their own datasets

Some things to focus on for feedback:

Extensibility of the ScorerEvalConfig class and run_evaluation method. We want evaluations to be as easy to implement for other Scorers and scales as possible as we gather new datasets!
How the metrics are retrieved through the eval_stats_to_json Scorer class method
How the metrics and csv of scores are saved (via the scorer_evals_results directory right now)

Please note that the full datasets used for generating the official metrics for integration tests (in datasets/score/scorer_evals/metrics) are not included in this PR as they contain harmful content.

Tests and Documentation

Integration test written for Likert Scorer TODO: unit tests for evaluation methods, move implementation of eval for Refusal Scorer into run_evaluation() method, figure out where to put the AIRT-generated datasets

May 07 '25 17:05 jsong468