llm-benchmarking topics

llm4regression

116

Stars

17

Forks

Watchers

Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update

robertvacareanu

large-language-models

linear-regression

llm

llm-benchmarking

LLM-Research

34

Stars

6

Forks

Watchers

A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks

asimsinan

arxiv-papers

buyuk-dil-modelleri

large-language-models

llm

pint-benchmark

82

Stars

9

Forks

Watchers

A benchmark for prompt injection detection systems.

lakeraai

benchmark

llm

llm-benchmarking

llm-security

fm-leaderboarder

18

Stars

5

Forks

Watchers

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

aws-samples

llm-benchmarking

llm-evaluation

llm-evaluation-framework

MJ-Bench

48

Stars

5

Forks

Watchers

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

MJ-Bench

llm-as-a-judge

llm-benchmarking

multimodal-foundation-model

multimodal-judge