verifiers
verifiers copied to clipboard
Difficulty filtering
With vf_eval.make_dataset, having support for an extra column for average_accuracy per prompt (over rollouts_per_example) would make difficulty filtering very easy