llm-benchmarking topic

List llm-benchmarking repositories

llm4regression

116
Stars
17
Forks
Watchers

Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update

LLM-Research

34
Stars
6
Forks
Watchers

A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks

pint-benchmark

82
Stars
9
Forks
Watchers

A benchmark for prompt injection detection systems.

fm-leaderboarder

18
Stars
5
Forks
Watchers

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

MJ-Bench

48
Stars
5
Forks
Watchers

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"