NICP icon indicating copy to clipboard operation
NICP copied to clipboard

checkpoint cannot be loaded as expected

Open boqian-li opened this issue 4 months ago • 6 comments

Hi, thanks for your great work! I have set up the env with the instruction in readme. And when I run PYTHONPATH=. python ./src/lvd_templ/evaluation/evaluation_benchmark.py

I get error:

Traceback (most recent call last):
  File "./src/lvd_templ/evaluation/evaluation_benchmark.py", line 280, in main
    run(cfg)
  File "./src/lvd_templ/evaluation/evaluation_benchmark.py", line 120, in run
    module, MD, train_data, cfg_model = get_model(chk)
  File "./src/lvd_templ/evaluation/evaluation_benchmark.py", line 70, in get_model
    module = model._load_model_state(checkpoint=old_checkpoint, metadata=MD).to(device)
  File "/root/anaconda3/envs/nsr/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 204, in _load_model_state
    keys = model.load_state_dict(checkpoint["state_dict"], strict=strict)
  File "/root/anaconda3/envs/nsr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LightUniversal:
        size mismatch for model.segm_list.0.5.bias: copying a param with shape torch.Size([93]) from checkpoint, the shape in current model is torch.Size([159]).
        size mismatch for model.segm_list.0.5.weight_g: copying a param with shape torch.Size([93, 1, 1]) from checkpoint, the shape in current model is torch.Size([159, 1, 1]).
        size mismatch for model.segm_list.0.5.weight_v: copying a param with shape torch.Size([93, 512, 1]) from checkpoint, the shape in current model is torch.Size([159, 512, 1]).
        size mismatch for model.segm_list.1.5.bias: copying a param with shape torch.Size([117]) from checkpoint, the shape in current model is torch.Size([93]).
        size mismatch for model.segm_list.1.5.weight_g: copying a param with shape torch.Size([117, 1, 1]) from checkpoint, the shape in current model is torch.Size([93, 1, 1]).
        size mismatch for model.segm_list.1.5.weight_v: copying a param with shape torch.Size([117, 512, 1]) from checkpoint, the shape in current model is torch.Size([93, 512, 1]).
        size mismatch for model.segm_list.2.5.bias: copying a param with shape torch.Size([270]) from checkpoint, the shape in current model is torch.Size([102]).
        size mismatch for model.segm_list.2.5.weight_g: copying a param with shape torch.Size([270, 1, 1]) from checkpoint, the shape in current model is torch.Size([102, 1, 1]).
        size mismatch for model.segm_list.2.5.weight_v: copying a param with shape torch.Size([270, 512, 1]) from checkpoint, the shape in current model is torch.Size([102, 512, 1]).
        size mismatch for model.segm_list.3.5.bias: copying a param with shape torch.Size([102]) from checkpoint, the shape in current model is torch.Size([270]).
        size mismatch for model.segm_list.3.5.weight_g: copying a param with shape torch.Size([102, 1, 1]) from checkpoint, the shape in current model is torch.Size([270, 1, 1]).
        size mismatch for model.segm_list.3.5.weight_v: copying a param with shape torch.Size([102, 512, 1]) from checkpoint, the shape in current model is torch.Size([270, 512, 1]).
        size mismatch for model.segm_list.4.5.bias: copying a param with shape torch.Size([162]) from checkpoint, the shape in current model is torch.Size([75]).
        size mismatch for model.segm_list.4.5.weight_g: copying a param with shape torch.Size([162, 1, 1]) from checkpoint, the shape in current model is torch.Size([75, 1, 1]).
        size mismatch for model.segm_list.4.5.weight_v: copying a param with shape torch.Size([162, 512, 1]) from checkpoint, the shape in current model is torch.Size([75, 512, 1]).
        size mismatch for model.segm_list.5.5.bias: copying a param with shape torch.Size([159]) from checkpoint, the shape in current model is torch.Size([114]).
        size mismatch for model.segm_list.5.5.weight_g: copying a param with shape torch.Size([159, 1, 1]) from checkpoint, the shape in current model is torch.Size([114, 1, 1]).
        size mismatch for model.segm_list.5.5.weight_v: copying a param with shape torch.Size([159, 512, 1]) from checkpoint, the shape in current model is torch.Size([114, 512, 1]).
        size mismatch for model.segm_list.6.5.bias: copying a param with shape torch.Size([114]) from checkpoint, the shape in current model is torch.Size([78]).
        size mismatch for model.segm_list.6.5.weight_g: copying a param with shape torch.Size([114, 1, 1]) from checkpoint, the shape in current model is torch.Size([78, 1, 1]).
        size mismatch for model.segm_list.6.5.weight_v: copying a param with shape torch.Size([114, 512, 1]) from checkpoint, the shape in current model is torch.Size([78, 512, 1]).
        size mismatch for model.segm_list.7.5.bias: copying a param with shape torch.Size([75]) from checkpoint, the shape in current model is torch.Size([120]).
        size mismatch for model.segm_list.7.5.weight_g: copying a param with shape torch.Size([75, 1, 1]) from checkpoint, the shape in current model is torch.Size([120, 1, 1]).
        size mismatch for model.segm_list.7.5.weight_v: copying a param with shape torch.Size([75, 512, 1]) from checkpoint, the shape in current model is torch.Size([120, 512, 1]).
        size mismatch for model.segm_list.8.5.bias: copying a param with shape torch.Size([159]) from checkpoint, the shape in current model is torch.Size([141]).
        size mismatch for model.segm_list.8.5.weight_g: copying a param with shape torch.Size([159, 1, 1]) from checkpoint, the shape in current model is torch.Size([141, 1, 1]).
        size mismatch for model.segm_list.8.5.weight_v: copying a param with shape torch.Size([159, 512, 1]) from checkpoint, the shape in current model is torch.Size([141, 512, 1]).
        size mismatch for model.segm_list.9.5.bias: copying a param with shape torch.Size([78]) from checkpoint, the shape in current model is torch.Size([141]).
        size mismatch for model.segm_list.9.5.weight_g: copying a param with shape torch.Size([78, 1, 1]) from checkpoint, the shape in current model is torch.Size([141, 1, 1]).
        size mismatch for model.segm_list.9.5.weight_v: copying a param with shape torch.Size([78, 512, 1]) from checkpoint, the shape in current model is torch.Size([141, 512, 1]).
        size mismatch for model.segm_list.10.5.bias: copying a param with shape torch.Size([120]) from checkpoint, the shape in current model is torch.Size([159]).
        size mismatch for model.segm_list.10.5.weight_g: copying a param with shape torch.Size([120, 1, 1]) from checkpoint, the shape in current model is torch.Size([159, 1, 1]).
        size mismatch for model.segm_list.10.5.weight_v: copying a param with shape torch.Size([120, 512, 1]) from checkpoint, the shape in current model is torch.Size([159, 512, 1]).
        size mismatch for model.segm_list.11.5.bias: copying a param with shape torch.Size([141]) from checkpoint, the shape in current model is torch.Size([114]).
        size mismatch for model.segm_list.11.5.weight_g: copying a param with shape torch.Size([141, 1, 1]) from checkpoint, the shape in current model is torch.Size([114, 1, 1]).
        size mismatch for model.segm_list.11.5.weight_v: copying a param with shape torch.Size([141, 512, 1]) from checkpoint, the shape in current model is torch.Size([114, 512, 1]).
        size mismatch for model.segm_list.12.5.bias: copying a param with shape torch.Size([114]) from checkpoint, the shape in current model is torch.Size([165]).
        size mismatch for model.segm_list.12.5.weight_g: copying a param with shape torch.Size([114, 1, 1]) from checkpoint, the shape in current model is torch.Size([165, 1, 1]).
        size mismatch for model.segm_list.12.5.weight_v: copying a param with shape torch.Size([114, 512, 1]) from checkpoint, the shape in current model is torch.Size([165, 512, 1]).
        size mismatch for model.segm_list.13.5.bias: copying a param with shape torch.Size([141]) from checkpoint, the shape in current model is torch.Size([99]).
        size mismatch for model.segm_list.13.5.weight_g: copying a param with shape torch.Size([141, 1, 1]) from checkpoint, the shape in current model is torch.Size([99, 1, 1]).
        size mismatch for model.segm_list.13.5.weight_v: copying a param with shape torch.Size([141, 512, 1]) from checkpoint, the shape in current model is torch.Size([99, 512, 1]).
        size mismatch for model.segm_list.14.5.bias: copying a param with shape torch.Size([99]) from checkpoint, the shape in current model is torch.Size([123]).
        size mismatch for model.segm_list.14.5.weight_g: copying a param with shape torch.Size([99, 1, 1]) from checkpoint, the shape in current model is torch.Size([123, 1, 1]).
        size mismatch for model.segm_list.14.5.weight_v: copying a param with shape torch.Size([99, 512, 1]) from checkpoint, the shape in current model is torch.Size([123, 512, 1]).
        size mismatch for model.segm_list.15.5.bias: copying a param with shape torch.Size([126]) from checkpoint, the shape in current model is torch.Size([117]).
        size mismatch for model.segm_list.15.5.weight_g: copying a param with shape torch.Size([126, 1, 1]) from checkpoint, the shape in current model is torch.Size([117, 1, 1]).
        size mismatch for model.segm_list.15.5.weight_v: copying a param with shape torch.Size([126, 512, 1]) from checkpoint, the shape in current model is torch.Size([117, 512, 1]).

After checking the error info, I think maybe the checkpoint wasn't obtained by passing the params needed to correctly instantiate the model as a keyword dictionary. Maybe other mistakes.

boqian-li avatar Oct 13 '24 11:10 boqian-li