Ask-Anything
Ask-Anything copied to clipboard
Cannot reproduce videochatgpt video benchmark
Dear author, I found that in your paper, you claimed a very impressive performance on videochatgpt video benchmark. However, I didn't find related code about reproducing this experiment. So I modified the mvbench evaluation code to inference on this task. But I can't reproduce it.
What I got is:
The model is the right one because I used the same model and reproduced the performance on MVbench.
So the only difference might be the different prompt used when testing the dataset, however I can't really believe that a different prompt can lead to such a big difference.
So could you please give me the prompt used when testing this benchmark or the inference result of this? Extraordinary claims require extraordinary evidence.