multi-choice topic

List multi-choice repositories

CValues

469
Stars
20
Forks
Watchers

面向中文大模型价值观的评估与对齐研究

ORQA

40
Stars
1
Forks
40
Watchers

[AAAI 2025] ORQA is a new QA benchmark designed to assess the reasoning capabilities of LLMs in a specialized technical domain of Operations Research. The benchmark evaluates whether LLMs can emulate...