InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

msrvtt_1k_test datasetset anno list from where

Open mazhengyu8282 opened this issue 9 months ago • 3 comments

你好,我按照指引地址下载了MSRVTT,里面的test_list有很多,我想i请问用的是哪一个? 我下载的MSRVTT解压后文件目录如下: annotation high-quality structured-symlinks videos 请问test_1k是哪个文件夹下的哪个文件? 是MSRVTT/structured-symlinks/val_list_jsfusion.txt么

mazhengyu8282 avatar Apr 09 '25 12:04 mazhengyu8282

你好, {"msrvtt_1k_test_sim":{"v2t_r1":4.38,"v2t_r5":20.24,"v2t_r10":37.02,"v2t_r_mean":20.55,"t2v_r1":3.1,"t2v_r5":15.16,"t2v_r10":31.2,"t2v_r_mean":16.49,"r_mean":18.52},"msrvtt_1k_test_dsl":{"v2t_r1":4.48,"v2t_r5":20.36,"v2t_r10":36.84,"v2t_r_mean":20.56,"t2v_r1":3.16,"t2v_r5":15.18,"t2v_r10":30.0,"t2v_r_mean":16.11,"r_mean":18.34},"msrvtt_1k_test_match":{"v2t_r1":4.58,"v2t_r5":21.36,"v2t_r10":39.44,"v2t_r_mean":21.79,"t2v_r1":3.12,"t2v_r5":17.16,"t2v_r10":33.76,"t2v_r_mean":18.01,"r_mean":19.9}} 我使用MSRVTT/high-quality/structured-symlinks/test_list_miech.txt作为test测试集的划分,结果如上,没能复现论文结果,请问下我可能是哪一步或者哪些步做的不太对?

mazhengyu8282 avatar Apr 10 '25 02:04 mazhengyu8282

我发现一条视频对应20个caption,应该只采用一个caption,更正数据集后得到的结果如下: v2t_r1 v2t_r5 v2t_r10 v2t_r_mean t2v_r1 t2v_r5 t2v_r10 t2v_r_mean r_mean msrvtt_1k_test_sim 43.3 71.6 79.7 64.87 44.0 71.1 81.9 65.67 65.27 msrvtt_1k_test_dsl 43.8 71.6 79.8 65.07 43.8 72.6 81.7 66.03 65.55 msrvtt_1k_test_match 50.0 74.2 81.1 68.43 52.1 76.7 83.6 70.80 69.62 我想请问论文中指标对应的以下三种的哪一种?是msrvtt_1k_test_sim or msrvtt_1k_test_dsl or msrvtt_1k_test_match

mazhengyu8282 avatar Apr 10 '25 09:04 mazhengyu8282

还有想请问下原生的clip只能处理图片,文中是怎样拿来做视频的encoder的?一个视频只抽一帧么?

mazhengyu8282 avatar Apr 10 '25 09:04 mazhengyu8282