finetune stage2 of Internvideo2 with num_frames 12 error

Open Eliza-and-black opened this issue 9 months ago • 0 comments

When I try to finetune stage2 of Internvideo2 with num_frames 12, I meet the error below:

[rank0]:   File "/root/nginx/multi_modality/tasks/shared_utils.py", line 192, in setup_model
[rank0]:     msg = model_without_ddp.load_state_dict(state_dict, strict=False)
[rank0]:   File "/usr/local/Python3.8.12/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
[rank0]:     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
[rank0]: RuntimeError: Error(s) in loading state_dict for InternVideo2_Stage2:
[rank0]:        size mismatch for vision_encoder.pos_embed: copying a param with shape torch.Size([1, 1025, 1408]) from checkpoint, the shape in current model is torch.Size([1, 3073, 1408]).

How to solve it? Looking forward to your reply. Thanks.

Apr 08 '25 03:04 Eliza-and-black