Two questions about the experimental results in Tabel 1 of the paper.
Hi, I would like to ask you two questions about the experimental results in the paper's Table 1.
I would like to ask where the acc 53.97 of full tuning of ssv2 was obtained?
When I read VideoMAE, I found that pretrain on ssv2 and then finetune on ssv2 can get 69.3 results. I know your paper is using the K400's pretrain parameters, but I also did experiments and I can achieve 65+ results with 50 rounds finetune on ssv2:

- so my first question is where did you get 53.97 from?
- The second question is that the data in the chart below I also did not find in the table, is it written wrong?

Hi,
Thanks for your raised question.
-
May I know your detailed configuration including command and pre-trained weights?
-
Good catch. We are sorry for that typo. We updated the table while missing the main text. Thanks again for pointing it out and we've fixed it in our camera-ready version.
Hi,
1 pre-trained weights: The pre-training weights I use is
https://drive.google.com/file/d/1JfrhN144Hdg7we213H1WxwR3lGYOlmIn/view
the location where the picture is shown in Annex 1.
2 shell: I basically did not change the shell relative to VideoMAE, but only changed the batch size from 256 to 64. The equivalent is now using the parameters of K400 pretrain 800 rounds and fintune 50 rounds on ssv2. The final results obtained are compared with the same case using ssv2 pretrain 1600 rounds, as the picture in Annex 2 shows, they both exceed 65+ acc1. And the shell is in the Annex 3.
Thanks for your reply. Did you experiment with the VideoMAE codebase?
I guess you experiment with strong data augmentation and optimizer (e.g. AdamW). For fair comparison to linear probling, We experiment with the same setting as linear probe, which uses SGD and does not contain strong data augmentation.
Please let me know if I miss something.
Thanks for your reply. I will do the experiment to verify it again!
@ShoufaChen @yangzhen1997 I was also experiencing the same problem. Even though you removed those augmentations and AdamW optimizer, will your method still be able to improve the results? Based on my experiments, adding augmentations and AdamW optimizer did not improve (and sometimes degraded) the performance. This is because in full fine-tuning, they are used to reduce the model overfitting when tuning many parameters. However, in VPT and your method, since we are only tuning a small fraction of parameters, it does not improve model performance. Therefore, would it be fair to report the full fine-tuning results without any augmentations and sophisticated optimizers?