llm-evaluation-toolkit topic

List llm-evaluation-toolkit repositories

langtest

549

Stars

50

Forks

549

Watchers

Deliver safe & effective language models

Pacific-AI-Corp

large-language-models

athina-evals

210

Stars

12

Forks

Watchers

Python SDK for running evaluations on LLM generated responses

evaluation-framework

evaluation-metrics

just-eval

74

Stars

6

Forks

Watchers

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

parea-sdk-py

74

Stars

6

Forks

Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

good-first-issue

KIEval

38

Stars

2

Forks

38

Watchers

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

qa_metrics

59

Stars

7

Forks

59

Watchers

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model promp...

llm-evaluation-framework