InternVideo
InternVideo copied to clipboard
finetune stage2 of Internvideo2 with num_frames 12 error
When I try to finetune stage2 of Internvideo2 with num_frames 12, I meet the error below:
[rank0]: File "/root/nginx/multi_modality/tasks/shared_utils.py", line 192, in setup_model
[rank0]: msg = model_without_ddp.load_state_dict(state_dict, strict=False)
[rank0]: File "/usr/local/Python3.8.12/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
[rank0]: raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
[rank0]: RuntimeError: Error(s) in loading state_dict for InternVideo2_Stage2:
[rank0]: size mismatch for vision_encoder.pos_embed: copying a param with shape torch.Size([1, 1025, 1408]) from checkpoint, the shape in current model is torch.Size([1, 3073, 1408]).
How to solve it? Looking forward to your reply. Thanks.