blog
blog copied to clipboard
Fix size mismatch error
trafficstars
Without ignore_mismatched_sizes=True, the code will raise the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-4-1f3e5e48d5a5> in <module>()
4 num_labels=len(labels),
5 id2label={str(i): c for i, c in enumerate(labels)},
----> 6 label2id={c: str(i) for i, c in enumerate(labels)}
7 )
1 frames
/usr/local/lib/python3.7/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2166 offload_folder=offload_folder,
2167 offload_state_dict=offload_state_dict,
-> 2168 dtype=torch_dtype,
2169 )
2170
/usr/local/lib/python3.7/dist-packages/transformers/modeling_utils.py in _load_pretrained_model(cls, model, state_dict, loaded_keys, resolved_archive_file, pretrained_model_name_or_path, ignore_mismatched_sizes, sharded_metadata, _fast_init, low_cpu_mem_usage, device_map, offload_folder, offload_state_dict, dtype)
2413 "\n\tYou may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method."
2414 )
-> 2415 raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
2416
2417 if len(unexpected_keys) > 0:
RuntimeError: Error(s) in loading state_dict for ViTForImageClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([1000, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
size mismatch for classifier.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([2]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
transformers version: 4.21.1.
I'm not running into this issue when running the colab linked in the blog post. Using same transformers version (4.21.1), etc. Did you change the model_id by any chance?
@nateraw OK my fault. I use model_id google/vit-base-patch16-224 instead of google/vit-base-patch16-224-in21k. Cound you please tell me the difference between them? I read the doc and still can't tell the difference.