HQGA
HQGA copied to clipboard
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)
In your paper, you said extracted each video in MSVD-QA p=15 frames per second. I found in HCRN that they use all frames. I hope you can tell how to...
Thank you for your excellent work! Should I extract image from raw video before I extract feature by BUTD? Should I save the pre-proposed bbox and output feature in the...
It was a excellent work and I have learned a lot in this work. But I am still confused about the extraction of BERT, I can't extract it and reappear...
Thank you for sharing such great work! I am very interested in it! But I have many difficulties in MSRVTT feature extraction, and my GPU is relatively poor. Therefore, I...
much appreciated!
Thanks for your code! How can I obtain the extracted bbox region features for NExT-QA? Looking forward to your reply.
Hi, Very interesting work I wanted to know if there was a way to produce the graphs given a video like shown in the README Thank you
Hi, thanks for your sharing. I didn't find ans_word.json file in nextqa folder. How many answers of high frequency are used as predefined answer set for NExT-QA ? How do...