DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Come into this error when evaluate model in the sft step:RuntimeError: Error(s) in loading state_dict for OPTForCausalLM: size mismatch for model.decoder.embed_tokens.weight: copying a param with shape torch.Size([50272, 2048]) from checkpoint, the shape in current model is torch.Size([50265, 2048]). size mismatch for lm_head.weight: copying a param with shape torch.Size([50272, 2048]) from checkpoint, the shape in current model is torch.Size([50265, 2048]). You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
The problem should be resolved in the latest branch.
Closed as no followup