LLaVA-NeXT Request for NExTQA Dataset Evaluation Prompt and More Results on Challenging Datasets for Fair Comparison

Request for NExTQA Dataset Evaluation Prompt and More Results on Challenging Datasets for Fair Comparison

Open patrick-tssn opened this issue 1 year ago • 1 comments

To my knowledge, the videos in NExTQA dataset are relatively short, with an average video length of 44 seconds, and there is a noted static bias[1] in the ActivityNet QA dataset. Could you present further results on more demanding datasets for fair comparison, such as EgoSchema[2]? Additionally, Could I request that you supply the evaluation prompt for the NeXTQA dataset?

[1] Lei, Jie et al. “Revealing Single Frame Bias for Video-and-Language Learning.” ArXiv abs/2206.03428 (2022): n. pag. [2] Mangalam, Karttikeya et al. “EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding.” ArXiv abs/2308.09126 (2023): n. pag.

May 10 '24 14:05 patrick-tssn

Thanks for your advise. The evaluation on the EgoSchema is ongoing.

The prompt for the NeXTQA is: Answer the question using several words or phrase.'

May 14 '24 05:05 ZhangYuanhan-AI

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Request for NExTQA Dataset Evaluation Prompt and More Results on Challenging Datasets for Fair Comparison

LLaVA-NeXT
LLaVA-NeXT copied to clipboard