evaluate
evaluate copied to clipboard
Implement "text generation" task in the Evaluator
In addition to the current task types available in the Evaluator we want a generic text generation pipeline which runs inference and returns generations. The "data" the evaluator will take in this case will be (optionally) a set of prompts for the language model. This will be useful for implementing evaluations requiring a set of model generations, such as RealToxicityPrompts's "toxicity probability" and the regard metric from this paper.