Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
doc-doc
[IEEE TMM'25] Scene-Text Grounding for Text-Based Video Question Answering
zhousheng97