acl2024 topic
UHGEval
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
Scientific-Inspiration-Machines-Optimized-for-Novelty
Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty
langsuite
Official Repo of LangSuitE
LooGLE
ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models
raid
RAID is the largest and most challenging benchmark for machine-generated text detectors. (ACL 2024)
timechara
🧙🏻Code and benchmark for our Findings of ACL 2024 paper - "TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models"
Cotempqa
Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)
NewsBench
[ACL 2024 Main] NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism
KIEval
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
camera
Multimodal dataset for ad text generation in Japanese [Mita+, ACL2024]