LLaVA-NeXT Different Reported Results on NeXT-QA and Egoschema

Different Reported Results on NeXT-QA and Egoschema

Open jongwoopark7978 opened this issue 1 year ago • 3 comments

Hi Team,

I saw that LLaVA-NeXT-Video-32B-Qwen obtains 77.31%, 63% accuracy on NeXT-QA and Egoschema here: https://huggingface.co/lmms-lab/LLaVA-NeXT-Video-32B-Qwen.

On the other hand, LLaVA-NeXT-Video-DPO (34B) achieves 27.30% accuracy on NeXT-QA dataset.

Why the accuracy differes a lot? Did LLaVA-NeXT-Video-32B-Qwen used separate LLM to solve the question and LLaVA-NeXT-Video-DPO (34B) answered the question within VLM itself?

Thank you for your answer in advance.

Jul 24 '24 03:07 jongwoopark7978

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Different Reported Results on NeXT-QA and Egoschema

LLaVA-NeXT
LLaVA-NeXT copied to clipboard