llm-evaluation topic

List llm-evaluation repositories

giskard

3.3k
Stars
215
Forks
Watchers

🐢 Open-Source Evaluation & Testing for LLMs and ML models

Awesome-LLM-Eval

305
Stars
33
Forks
Watchers

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表...

llms-tools

68
Stars
13
Forks
Watchers

A list of LLMs Tools & Projects

langfuse

4.5k
Stars
407
Forks
18
Watchers

🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

superpipe

99
Stars
1
Forks
Watchers

Superpipe - optimized LLM pipelines for structured data

raga-llm-hub

75
Stars
6
Forks
Watchers

Framework for LLM evaluation, guardrails and security

parea-sdk-py

41
Stars
4
Forks
Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

hallucination-index

47
Stars
4
Forks
Watchers

Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.

CONNER

27
Stars
1
Forks
Watchers

The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“

pratical-llms

31
Stars
7
Forks
Watchers

A collection of hand on notebook for LLMs practitioner