面向中文大模型价值观的评估与对齐研究
X-PLUG
[AAAI 2025] ORQA is a new QA benchmark designed to assess the reasoning capabilities of LLMs in a specialized technical domain of Operations Research. The benchmark evaluates whether LLMs can emulate...
nl4opt