evaluation-llms topic

List evaluation-llms repositories

AttrScore

52
Stars
2
Forks
Watchers

Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"

CompBench

30
Stars
1
Forks
Watchers

CompBench evaluates the comparative reasoning of multimodal large language models (MLLMs) with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, st...