FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:

Open landerson85 opened this issue 1 year ago • 1 comments

Traceback (most recent call last): File "/data/app/FastChat/fastChat/train/train.py", line 335, in train() File "/data/app/FastChat/fastChat/train/train.py", line 328, in train trainer.train(resume_from_checkpoint=True) File "/data/app/install/transformers/src/transformers/trainer.py", line 1651, in train self._load_from_checkpoint(resume_from_checkpoint) File "/data/app/install/transformers/src/transformers/trainer.py", line 2159, in _load_from_checkpoint load_result = model.load_state_dict(state_dict, False) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). size mismatch for lm_head.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 6660) of binary: /usr/bin/python3

CUDA:12.1 torch:11.8 transformers 4.28-dev

landerson85 avatar Apr 24 '23 05:04 landerson85

same here

Dankmank avatar May 01 '23 05:05 Dankmank

the same issue

elven2016 avatar Aug 24 '23 02:08 elven2016

I had this error with older version of libraries and when there was not enough gpu memory. Can you try with a clean new virtual environment with the latest versions of everything? Is it still an issue @landerson85 @Dankmank @elven2016 ?

surak avatar Oct 21 '23 16:10 surak

i meet the same issue. have you already solve it? thanks a lot !!!!

jzssz avatar Dec 22 '23 11:12 jzssz