LongBench
LongBench copied to clipboard
Do models really answer directly in the non-CoT setting?
I noticed that there is not a big difference between non-CoT and CoT performance. I'm curious if the models are in fact answering directly, without any intermediate reasoning, in the "non-CoT" setting.
I think many models will generate intermediate reasoning anyways, given the prompt template given in the paper:
Please read the following text and answer the question below.
<text>
{Long Context}
</text>
What is the correct answer to this question: {Question}
Choices:
(A) {Choice A}
(B) {Choice B}
(C) {Choice C}
(D) {Choice D}
Format your response as follows: “The correct answer is (insert answer here)”.
Do you have any insights on this, or statistics on the output lengths of various models?
Thanks
Testing on Gemma 3 shows that the no-CoT prompt elicits direct answer, followed by explanation