llm-evaluation-toolkit topic

List llm-evaluation-toolkit repositories

langtest

496
Stars
39
Forks
Watchers

Deliver safe & effective language models

athina-evals

210
Stars
12
Forks
Watchers

Python SDK for running evaluations on LLM generated responses

just-eval

74
Stars
6
Forks
Watchers

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

parea-sdk-py

74
Stars
6
Forks
Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

KIEval

32
Stars
2
Forks
Watchers

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models