cav-mae icon indicating copy to clipboard operation
cav-mae copied to clipboard

Question for contrastive loss weight in the paper

Open sukun1045 opened this issue 1 year ago • 3 comments

I have a question regarding the weights used in CAV-MAE. It seems like the $\lambda_c$ could play an important role in the optimization. I understand it is due to the gradient scale but It is surprising that the ablation study for CAV (contrastive loss only) still requires $\lambda_c$ to be $0.1$ or $0.01$. I am wondering what happened if $\lambda_c$ is set as 1? Will it lead to overfitting issue?

Best,

Kun

sukun1045 avatar Oct 26 '23 19:10 sukun1045

hi there,

could you point me to the table you are referring to?

$\lambda$ scales the loss, which is related to the learning rate. Sometimes we keep $\lambda$ same for CAV and CAV-MAE to make a fair comparison. I think it can be / should be set to 1 if you solely interested in CAV, but you may need to tune the learning rate.

-Yuan

YuanGongND avatar Oct 27 '23 07:10 YuanGongND

Yeah, that's what I thought. I can tune the learning rate, but is there any particular reason that the contrastive loss needs a smaller learning rate?

In Table 3, Audio-Visual Models with only Contrastive Loss. image

sukun1045 avatar Oct 27 '23 17:10 sukun1045

I believe there are two things:

  1. In the current lr setting, lambda=0.01 / lambda=0.1 is also a better hyperparameter setting than lambda = 1 for the joint classification task, if I recall correctly, I did a search on lamda C, this is because I want to prove CAV-MAE is better than CAV, so I have to find best hyperparameter for CAV. On the other hand, if you set lambda = 1, you probably need to tune LR, so the training setting of CAV and CAV-MAE will be different.

  2. In this table, the main purpose is to say adding a MAE loss doesn't hurt the retrieval performance, so I control lambda the same for a fair comparison.

-Yuan

YuanGongND avatar Oct 27 '23 18:10 YuanGongND