llm-benchmarking topic

List llm-benchmarking repositories

llm4regression

156

Stars

21

Forks

156

Watchers

Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update

robertvacareanu

large-language-models

linear-regression

llm-benchmarking

LLM-Research

59

Stars

9

Forks

59

Watchers

A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks

buyuk-dil-modelleri

large-language-models

pint-benchmark

148

Stars

18

Forks

148

Watchers

A benchmark for prompt injection detection systems.

llm-benchmarking

LLMEvaluation

152

Stars

12

Forks

152

Watchers

A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessmen...

generative-ai-benchmarking

llm-benchmarking

fm-leaderboarder

19

Stars

5

Forks

19

Watchers

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

llm-benchmarking

llm-evaluation-framework

MJ-Bench

49

Stars

5

Forks

49

Watchers

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

llm-benchmarking

multimodal-foundation-model

multimodal-judge

Awesome-Code-Benchmark

155

Stars

13

Forks

155

Watchers

A comprehensive code domain benchmark review of LLM researches.

code-completion

enterprise-deep-research

948

Stars

143

Forks

948

Watchers

Salesforce Enterprise Deep Research

SalesforceAIResearch

deep-research-agent

confabulations

236

Stars

7

Forks

236

Watchers

Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.

BizFinBench

209

Stars

7

Forks

209

Watchers

A Business-Driven Real-World Financial Benchmark for Evaluating LLMs

HiThink-Research

llm-benchmarking