llm-evaluation-toolkit topic

List llm-evaluation-toolkit repositories

langtest

549
Stars
50
Forks
549
Watchers

Deliver safe & effective language models

athina-evals

210
Stars
12
Forks
Watchers

Python SDK for running evaluations on LLM generated responses

just-eval

74
Stars
6
Forks
Watchers

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

parea-sdk-py

74
Stars
6
Forks
Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

KIEval

38
Stars
2
Forks
38
Watchers

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

qa_metrics

59
Stars
7
Forks
59
Watchers

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model promp...