llm-evaluation topic
giskard
🐢 Open-Source Evaluation & Testing for LLMs and ML models
Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表...
llms-tools
A list of LLMs Tools & Projects
langfuse
🪢 Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
superpipe
Superpipe - optimized LLM pipelines for structured data
raga-llm-hub
Framework for LLM evaluation, guardrails and security
parea-sdk-py
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
hallucination-index
Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.
CONNER
The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
pratical-llms
A collection of hand on notebook for LLMs practitioner