evaluation-metrics topic
ctc-gen-eval
EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation
StreamingRec
A news recommendation evaluation framework
agentops
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and Ca...
TaPR
Time-series Aware Precision and Recall for Evaluating Anomaly Detection Methods
CERberus
CERberus -- guardian against character errors :dog::dog::dog:
chatgpt_as_nlg_evaluator
Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study
ErrorAnalysis_Prompt
:gift:[ChatGPT4MTevaluation] ErrorAnalysis Prompt for MT Evaluation in ChatGPT
continuous-eval
Data-Driven Evaluation for LLM-Powered Applications
summarization-eval
📝 Reference-Free automatic summarization evaluation with potential hallucination detection
summary-workbench
Framework for unified summarisation and evaluation of English documents using state-of-the-art models and measures.