LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Different Reported Results on NeXT-QA and Egoschema
Hi Team,
I saw that LLaVA-NeXT-Video-32B-Qwen obtains 77.31%, 63% accuracy on NeXT-QA and Egoschema here: https://huggingface.co/lmms-lab/LLaVA-NeXT-Video-32B-Qwen.
On the other hand, LLaVA-NeXT-Video-DPO (34B) achieves 27.30% accuracy on NeXT-QA dataset.
Why the accuracy differes a lot? Did LLaVA-NeXT-Video-32B-Qwen used separate LLM to solve the question and LLaVA-NeXT-Video-DPO (34B) answered the question within VLM itself?
Thank you for your answer in advance.