llm-benchmarking topic

List llm-benchmarking repositories

llm4regression

156
Stars
21
Forks
156
Watchers

Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update

LLM-Research

59
Stars
9
Forks
59
Watchers

A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks

pint-benchmark

148
Stars
18
Forks
148
Watchers

A benchmark for prompt injection detection systems.

LLMEvaluation

152
Stars
12
Forks
152
Watchers

A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessmen...

fm-leaderboarder

19
Stars
5
Forks
19
Watchers

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

MJ-Bench

49
Stars
5
Forks
49
Watchers

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

Awesome-Code-Benchmark

155
Stars
13
Forks
155
Watchers

A comprehensive code domain benchmark review of LLM researches.

enterprise-deep-research

948
Stars
143
Forks
948
Watchers

Salesforce Enterprise Deep Research

confabulations

236
Stars
7
Forks
236
Watchers

Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.

BizFinBench

209
Stars
7
Forks
209
Watchers

A Business-Driven Real-World Financial Benchmark for Evaluating LLMs