PyRIT icon indicating copy to clipboard operation
PyRIT copied to clipboard

[DRAFT] FEAT: Scorer Evaluations

Open jsong468 opened this issue 7 months ago • 0 comments

Description

This PR introduces an evaluation framework for PyRIT Scorers. The goals of this PR are:

  • Setting up the process of retrieving baseline metrics for Scorers based on manually generated and (AIRT) human-graded datasets
  • Using the metrics for thresholding of Scorer integration tests
  • Allowing users to run their own evaluations on Scorers using their own datasets

Some things to focus on for feedback:

  • Extensibility of the ScorerEvalConfig class and run_evaluation method. We want evaluations to be as easy to implement for other Scorers and scales as possible as we gather new datasets!
  • How the metrics are retrieved through the eval_stats_to_json Scorer class method
  • How the metrics and csv of scores are saved (via the scorer_evals_results directory right now)

Please note that the full datasets used for generating the official metrics for integration tests (in datasets/score/scorer_evals/metrics) are not included in this PR as they contain harmful content.

Tests and Documentation

Integration test written for Likert Scorer TODO: unit tests for evaluation methods, move implementation of eval for Refusal Scorer into run_evaluation() method, figure out where to put the AIRT-generated datasets

jsong468 avatar May 07 '25 17:05 jsong468