cot is not as effective as direct answer in MLLM (from InternVL2.5-MPO paper)

Open EchoDreamer opened this issue 10 months ago • 0 comments

In the InternVL2.5-MPO paper, the author mentioned that cot is not as effective as direct answer for MLLM. I wonder why cot is so bad for MLLM compared to LLM? In addition, my recent experiment used QwenVL2.5 to answer questions and found that cot's reasoning effect is also very good, but it is difficult for it to completely follow instructions such as outputing yes or no, which leads to challenges when using the automated evaluation framework to extract answers. I would like to ask the author whether to have any further explanation for this issue?

Feb 24 '25 07:02 EchoDreamer