llm-evaluation topic

List llm-evaluation repositories

giskard

3.3k

Stars

215

Forks

Watchers

🐢 Open-Source Evaluation & Testing for LLMs and ML models

artificial-intelligence

continous-delivery

Awesome-LLM-Eval

305

Stars

33

Forks

Watchers

Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表...

llms-tools

68

Stars

13

Forks

Watchers

A list of LLMs Tools & Projects

langfuse

4.5k

Stars

407

Forks

18

Watchers

🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

superpipe

99

Stars

1

Forks

Watchers

Superpipe - optimized LLM pipelines for structured data

villagecomputing

data-extraction

raga-llm-hub

75

Stars

6

Forks

Watchers

Framework for LLM evaluation, guardrails and security

parea-sdk-py

41

Stars

4

Forks

Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

good-first-issue

hallucination-index

47

Stars

4

Forks

Watchers

Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.

large-language-models

CONNER

27

Stars

1

Forks

Watchers

The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“

pratical-llms

31

Stars

7

Forks

Watchers

A collection of hand on notebook for LLMs practitioner