Cap4Video icon indicating copy to clipboard operation
Cap4Video copied to clipboard

Low R1 performance in the 2nd stage

Open chenhao2345 opened this issue 2 years ago • 14 comments

Thanks for sharing your code. Is it normal to get R1=30 with train_titles.py? After running the score fusion, the title matrix does not improve the video matrix.

chenhao2345 avatar Jul 29 '23 13:07 chenhao2345

I am having the same issue. Incidentally, did you have any such error with the fusion score? I was running it on MSVD

Text-to-Video:
>>>  R@1: 30.4 - R@5: 59.7 - R@10: 70.7 - Median R: 3.0 - Mean R: 19.8
Video-to-Text:
>>>  V2T$R@1: 33.1 - V2T$R@5: 60.1 - V2T$R@10: 72.6 - V2T$Median R: 3.0 - V2T$Mean R: 18.4
video_matrix sim matrix size: (27763, 670), (27763, 670)
titles_shot_matrix sim matrix size: (27763, 670), (27763, 670)
Traceback (most recent call last):
  File "/local/Cap4Video/train_titles.py", line 723, in <module>
    fusion_scores()
  File "/local/Cap4Video/sim_matrix/fusion_scores.py", line 13, in fusion_scores
    tv_video_metrics = compute_metrics(video_matrix)
  File "/local/Cap4Video/metrics.py", line 13, in compute_metrics
    ind = sx - d
ValueError: operands could not be broadcast together with shapes (27763,670) (670,1) 

BishmoyPaul avatar Jul 31 '23 15:07 BishmoyPaul

@BishmoyPaul I'm running it on MSRVTT. I have not seen any problems with the fusion score on MSRVTT.

chenhao2345 avatar Aug 06 '23 09:08 chenhao2345

Same problem here. @whwu95 I got Rank-1 47.7 in the first stage train_video.py And Rank-1 around 30 in the second stage train_titles.py.

JosephPai avatar Aug 16 '23 17:08 JosephPai

BTW, do you know the purpose of fusion_scores? @chenhao2345

JosephPai avatar Aug 16 '23 17:08 JosephPai

@JosephPai I got similar performance. ~47.5 in stage 1 and 30 in stage 2.

It seems to me that the authors get two similarity scores from stage 1 and stage 2, respectively. Then, they use fusion_scores to fuze the two similarity scores.

chenhao2345 avatar Aug 23 '23 08:08 chenhao2345

I got R@1 45.3 in stage and 29.6 in stage 2, it seems like that the code is to do global matching ?

ASENNIU avatar Oct 23 '23 12:10 ASENNIU

I got R@1 45.3 in stage and 29.6 in stage 2, it seems like that the code is to do global matching ?

i think its true

shams2023 avatar Oct 23 '23 12:10 shams2023

Thanks for sharing your code. And how can I get the score 49 for R@1?

ASENNIU avatar Nov 02 '23 12:11 ASENNIU

@chenhao2345 @JosephPai @ASENNIU @BishmoyPaul Hi, can I know your batch size setting and the number of gpus you are using for training stage 1 & stage 2?

zef1611 avatar Nov 03 '23 07:11 zef1611

@zef1611 did you find the batch size, number of gpus and gpu type used in this project? can anyone please answer this? @chenhao2345 @JosephPai @ASENNIU @BishmoyPaul

fazliimam avatar Dec 06 '23 09:12 fazliimam

I am having the same issue. Incidentally, did you have any such error with the fusion score? I was running it on MSVD

Text-to-Video:
>>>  R@1: 30.4 - R@5: 59.7 - R@10: 70.7 - Median R: 3.0 - Mean R: 19.8
Video-to-Text:
>>>  V2T$R@1: 33.1 - V2T$R@5: 60.1 - V2T$R@10: 72.6 - V2T$Median R: 3.0 - V2T$Mean R: 18.4
video_matrix sim matrix size: (27763, 670), (27763, 670)
titles_shot_matrix sim matrix size: (27763, 670), (27763, 670)
Traceback (most recent call last):
  File "/local/Cap4Video/train_titles.py", line 723, in <module>
    fusion_scores()
  File "/local/Cap4Video/sim_matrix/fusion_scores.py", line 13, in fusion_scores
    tv_video_metrics = compute_metrics(video_matrix)
  File "/local/Cap4Video/metrics.py", line 13, in compute_metrics
    ind = sx - d
ValueError: operands could not be broadcast together with shapes (27763,670) (670,1) 

@BishmoyPaul How did you train on the MSVD dataset? If you were using the co_train_msrvtt.sh script what did you give for --data_path Could you share the training script

fazliimam avatar Dec 12 '23 19:12 fazliimam

@BishmoyPaul How did you train on the MSVD dataset? If you were using the co_train_msrvtt.sh script what did you give for --data_path Could you share the training script

xpx-best avatar Feb 10 '25 13:02 xpx-best

@fazliimam Have you solved this problem?

xpx-best avatar Feb 10 '25 13:02 xpx-best

@xpx-best I ran it nearly 2 years ago and I no longer have access to my old lab server. I have forgotten what I gave for --data_path, but I do remember that I eventually moved to X-CLIP models instead of Cap4Video.

BishmoyPaul avatar Feb 15 '25 00:02 BishmoyPaul