lm-evaluation-harness
lm-evaluation-harness copied to clipboard
feat: COT trace response handling in evaluator and model classes
- Added support for storing raw generations in HFLM and VLLM models.
- Updated the evaluator to log warnings when the length of raw generations does not match processed responses.
- Modified response collection to include raw responses when available.
This improves the evaluation process by allowing access to the original generated outputs alongside processed responses.