InternVideo
InternVideo copied to clipboard
Confusion about zero-shot setting on Video-Text Retrieval
Thank you for your in interesting work and your shared code!
I'm very confused that whether the zero-shot performance on MSRVTT reported in here requires setting “--mergeclip=True”?
Below is the result I reproduced:
“--mergeclip=True”:
“--mergeclip=False”:
AS the provided file defaults to "--mergeclip=True", I wonder if there is something wrong with this.
it seems that when setting “merge=True”,the results are better than the paper presented?
it seems that when setting “merge=True”,the results are better than the paper presented?
Yes. It seems that the results reported in the paper are obtained by setting “merge=True” without DSL.
it seems that when setting “merge=True”,the results are better than the paper presented?
Yes. It seems that the results reported in the paper are obtained by setting “merge=True” without DSL.
I test the performance on activityNet,and obtain better results on “merge=True” with DSL,but obtain worse results on “merge=True” without DSL(worse than paper presented). The author replied to another people that they use DSL results. I also confuse about which setting they use ~~
it seems that when setting “merge=True”,the results are better than the paper presented?
Yes. It seems that the results reported in the paper are obtained by setting “merge=True” without DSL.
I test the performance on activityNet,and obtain better results on “merge=True” with DSL,but obtain worse results on “merge=True” without DSL(worse than paper presented). The author replied to another people that they use DSL results. I also confuse about which setting they use ~~
Hi, were u able to resolve the confusion?