llm-evaluation topics

Awesome-LLM-in-Social-Science

233

Stars

13

Forks

Watchers

Awesome papers involving LLMs in Social Science.

Value4AI

alignment

economics

large-language-models

llm-agent

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...

promptfoo

llm

llmops

prompt-engineering

prompt-testing

continuous-eval

436

Stars

28

Forks

Watchers

Data-Driven Evaluation for LLM-Powered Applications

relari-ai

evaluation-framework

evaluation-metrics

information-retrieval

llm-evaluation

DCR-consistency

21

Stars

3

Forks

Watchers

DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models

intuit-ai-research

blackbox

consistency

divide-and-conquer-approach

hallucinations

CommonGen-Eval

84

Stars

3

Forks

Watchers

Evaluating LLMs with CommonGen-Lite

allenai

chatgpt

evaluation

gpt-evaluation

llama2

deepeval

3.3k

Stars

257

Forks

Watchers

The LLM Evaluation Framework

confident-ai

evaluation-framework

evaluation-metrics

llm-evaluation

llm-evaluation-framework

athina-evals

210

Stars

12

Forks

Watchers

Python SDK for running evaluations on LLM generated responses

athina-ai

evaluation

evaluation-framework

evaluation-metrics

llm-eval

just-eval

74

Stars

6

Forks

Watchers

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

Re-Align

evaluation

gpt4

llm

llm-eval

agenta

1.2k

Stars

182

Forks

Watchers

The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.

Agenta-AI

human-annotation

langchain

large-language-models

llama-index

leaf-playground

23

Stars

0

Forks

Watchers

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent ac...

LLM-Evaluation-s-Always-Fatiguing

agent

agent-based-simulation

agents

automation