llm-benchmarking topic
List
llm-benchmarking repositories
llm4regression
116
Stars
17
Forks
Watchers
Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update
LLM-Research
34
Stars
6
Forks
Watchers
A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks
pint-benchmark
82
Stars
9
Forks
Watchers
A benchmark for prompt injection detection systems.
fm-leaderboarder
18
Stars
5
Forks
Watchers
FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts
MJ-Bench
48
Stars
5
Forks
Watchers
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"