llm-evaluation-toolkit topic
List
llm-evaluation-toolkit repositories
langtest
496
Stars
39
Forks
Watchers
Deliver safe & effective language models
athina-evals
210
Stars
12
Forks
Watchers
Python SDK for running evaluations on LLM generated responses
just-eval
74
Stars
6
Forks
Watchers
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
parea-sdk-py
74
Stars
6
Forks
Watchers
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
KIEval
32
Stars
2
Forks
Watchers
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models