LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Request for NExTQA Dataset Evaluation Prompt and More Results on Challenging Datasets for Fair Comparison

Open patrick-tssn opened this issue 1 year ago • 1 comments

To my knowledge, the videos in NExTQA dataset are relatively short, with an average video length of 44 seconds, and there is a noted static bias[1] in the ActivityNet QA dataset. Could you present further results on more demanding datasets for fair comparison, such as EgoSchema[2]? Additionally, Could I request that you supply the evaluation prompt for the NeXTQA dataset?

[1] Lei, Jie et al. “Revealing Single Frame Bias for Video-and-Language Learning.” ArXiv abs/2206.03428 (2022): n. pag. [2] Mangalam, Karttikeya et al. “EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding.” ArXiv abs/2308.09126 (2023): n. pag.

patrick-tssn avatar May 10 '24 14:05 patrick-tssn

Thanks for your advise. The evaluation on the EgoSchema is ongoing.

The prompt for the NeXTQA is: Answer the question using several words or phrase.'

ZhangYuanhan-AI avatar May 14 '24 05:05 ZhangYuanhan-AI