llms-benchmarking topics

CompBench evaluates the comparative reasoning of multimodal large language models (MLLMs) with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, st...

RaptorMai

benchmark

evaluation-llms

foundation-models

human-annotation

cobbler

15

Stars

1

Forks

Watchers

Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"

minnesotanlp

bias

bias-detection

evaluation

llm

XMainframe

42

Stars

3

Forks

Watchers

Language Model for Mainframe Modernization

FSoft-AI4Code

cobol

code-summarization

codellm

llms-benchmarking

BackdoorLLM

35

Stars

4

Forks

Watchers

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

bboylyg

backdoor

llms

llms-benchmarking

text-embedding-evaluation

15

Stars

1

Forks

Watchers

Join 15k builders to the Real-World ML Newsletter ⬇️⬇️⬇️

Paulescu

embeddings

llms

llms-benchmarking

machine-learning