MA-LMM
MA-LMM copied to clipboard
How to generate prediction result of a whole video in 'lvu_cls'?
Hi, thank you for your awesome work! There is a question about the final prediction result of lvu_cls. I have found that in your code, the evaluation process are based on the prediction result of images which corresponds to the key of 'image_id' in result file. How can I aggregate the results of images to obtain the prediction result of a whole video when there exist multiple image predictions of the same video?