Pretraining data of released source models.

Open sarthaxxxxx opened this issue 1 year ago • 1 comments

Hi,

Amazing work! I'm just curious about the pretraining dataset used for experiments on Kinetics-50C. Is the original CAV-MAE model completely fine-tuned on Kinetics-50 or are the CAV-MAE weights, initialized from VGGSound, fixed with "only" the classifiers being fine-tuned?

I don't understand this statement in the Appendix (for both datasets) - "During the fine-tuning phase, we maintain the visual and audio encoders of the pre-trained model and add one randomly initialized classification head upon them." What are the pre-trained model weights here?

Dec 26 '24 19:12 sarthaxxxxx

Hi,

We just followed the finetuning pipeline of CAV-MAE on the VGGSound dataset to get cav_mae_ks50.pth. The main modification is that I replaced the label weight file (NOTE: not model weight) with the one from the KS50 dataset. For usage, you can directly access the finetuned checkpoint that has been uploaded.

Therefore, following CAV-MAE, the pre-trained weights for VGGSound and Kinetics50 fine-tuning are inherited from CAV-MAE Pretrained Model "cav-mae-scale++". The fine-tuning dataset used for experiments on Kinetics-50C are detailed on Appendix B in our paper. You can find the json file for the dataset in link.

Jan 01 '25 04:01 mouxingyang