peft
peft copied to clipboard
RunTimeError Missing keys while resuming training and cannot load checkpoint
System Info
peft==0.8.1 accelerate==0.26.1 transformers==4.37.1 deepspeed==0.13.1
Who can help?
No response
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder - [X] My own task or dataset (give details below)
Reproduction
I have trained a Mistral model with the deepspeed zero 3 configuration, now I want to resume from the checkpoint, then I am getting this error
RuntimeError Traceback (most recent call last)
Cell In[4], [line 136](vscode-notebook-cell:?execution_count=4&line=136)
[131](vscode-notebook-cell:?execution_count=4&line=131) trainer.save_metrics("test", test_metrics)
[135](vscode-notebook-cell:?execution_count=4&line=135) if __name__ == "__main__":
--> [136](vscode-notebook-cell:?execution_count=4&line=136) main(cfg)
Cell In[4], [line 114](vscode-notebook-cell:?execution_count=4&line=114)
[110](vscode-notebook-cell:?execution_count=4&line=110) model.state_dict = (
[111](vscode-notebook-cell:?execution_count=4&line=111) lambda self, *_, **__: get_peft_model_state_dict(self, old_state_dict())
[112](vscode-notebook-cell:?execution_count=4&line=112) ).__get__(model, type(model))
[113](vscode-notebook-cell:?execution_count=4&line=113) # start training
--> [114](vscode-notebook-cell:?execution_count=4&line=114) train_result = trainer.train(resume_from_checkpoint=True)
[115](vscode-notebook-cell:?execution_count=4&line=115) trainer.save_model(training_dir_name)
[116](vscode-notebook-cell:?execution_count=4&line=116) model.save_pretrained(training_dir_name)
File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1539](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1539), in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
[1537](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1537) hf_hub_utils.enable_progress_bars()
[1538](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1538) else:
-> [1539](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1539) return inner_training_loop(
[1540](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1540) args=args,
[1541](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1541) resume_from_checkpoint=resume_from_checkpoint,
[1542](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1542) trial=trial,
[1543](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1543) ignore_keys_for_eval=ignore_keys_for_eval,
[1544](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1544) )
File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1708](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1708), in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
[1706](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1706) if resume_from_checkpoint is not None:
[1707](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1707) if self.is_deepspeed_enabled:
-> [1708](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1708) deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint)
[1709](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1709) elif is_sagemaker_mp_enabled() or self.is_fsdp_enabled:
[1710](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1710) self._load_from_checkpoint(resume_from_checkpoint, self.model_wrapped)
File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:402](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:402), in deepspeed_load_checkpoint(deepspeed_engine, checkpoint_path)
[400](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:400) logger.info(f"Attempting to resume from {checkpoint_path}")
[401](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:401) # this magically updates self.optimizer and self.lr_scheduler
--> [402](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:402) load_path, _ = deepspeed_engine.load_checkpoint(
[403](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:403) checkpoint_path, load_optimizer_states=True, load_lr_scheduler_states=True
[404](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:404) )
[405](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:405) if load_path is None:
[406](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:406) raise ValueError(f"[deepspeed] failed to resume from checkpoint {checkpoint_path}")
File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2740](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2740), in DeepSpeedEngine.load_checkpoint(self, load_dir, tag, load_module_strict, load_optimizer_states, load_lr_scheduler_states, load_module_only, custom_load_fn)
[2736](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2736) if self._optimizer_has_ckpt_event_prologue():
[2737](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2737) # Prepare for checkpoint load by ensuring all parameters are partitioned
[2738](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2738) self.optimizer.checkpoint_event_prologue()
-> [2740](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2740) load_path, client_states = self._load_checkpoint(load_dir,
[2741](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2741) tag,
[2742](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2742) load_module_strict=load_module_strict,
[2743](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2743) load_optimizer_states=load_optimizer_states,
[2744](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2744) load_lr_scheduler_states=load_lr_scheduler_states,
[2745](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2745) load_module_only=load_module_only,
[2746](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2746) custom_load_fn=custom_load_fn)
[2748](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2748) load_zero_checkpoint = load_path is not None and (self.zero_optimization() or self.bfloat16_enabled())
[2749](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2749) if load_zero_checkpoint:
File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2825](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2825), in DeepSpeedEngine._load_checkpoint(self, load_dir, tag, load_module_strict, load_optimizer_states, load_lr_scheduler_states, load_module_only, custom_load_fn)
[2816](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2816) DeepSpeedEngine.load_moe_state_dict(load_dir,
[2817](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2817) tag,
[2818](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2818) state_dict=checkpoint['module'],
(...)
[2822](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2822) num_experts=self.num_experts,
[2823](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2823) checkpoint_engine=self.checkpoint_engine)
[2824](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2824) if not self.load_universal_checkpoint():
-> [2825](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2825) self.load_module_state_dict(checkpoint=checkpoint,
[2826](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2826) strict=load_module_strict,
[2827](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2827) custom_load_fn=custom_load_fn,
[2828](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2828) fetch_z3_params=fetch_z3_params)
[2830](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2830) self.loaded_checkpoint_dp_world_size = checkpoint['dp_world_size']
[2832](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2832) optim_checkpoint = None
File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2603](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2603), in DeepSpeedEngine.load_module_state_dict(self, checkpoint, strict, custom_load_fn, fetch_z3_params)
[2601](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2601) custom_load_fn(src=module_state_dict, dst=self.module)
[2602](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2602) else:
-> [2603](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2603) self.module.load_state_dict(
[2604](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2604) module_state_dict, # TODO
[2605](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2605) strict=strict)
[2607](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2607) if checkpoint.get(FROZEN_PARAM_FRAGMENTS, None) is not None:
[2608](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2608) saved_frozen_params = checkpoint[FROZEN_PARAM_FRAGMENTS]
File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152), in Module.load_state_dict(self, state_dict, strict, assign)
[2147](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2147) error_msgs.insert(
[2148](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2148) 0, 'Missing key(s) in state_dict: {}. '.format(
[2149](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2149) ', '.join(f'"{k}"' for k in missing_keys)))
[2151](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2151) if len(error_msgs) > 0:
-> [2152](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152) raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
[2153](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2153) self.__class__.__name__, "\n\t".join(error_msgs)))
[2154](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2154) return _IncompatibleKeys(missing_keys, unexpected_keys)
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
Missing key(s) in state_dict: "base_model.model.model.embed_tokens.weight", "base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.k_proj.weight", "base_model.model.model.layers.0.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.o_proj.weight", "base_model.model.model.layers.0.mlp.gate_proj.weight", "base_model.model.model.layers.0.mlp.up_proj.weight", "base_model.model.model.layers.0.mlp.down_proj.weight", "base_model.model.model.layers.0.input_layernorm.weight", "base_model.model.model.layers.0.post_attention_layernorm.weight", "base_model.model.model.layers.1.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.k_proj.weight", "base_model.model.model.layers.1.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.o_proj.weight", "base_model.model.model.layers.1.mlp.gate_proj.weight", "base_model.model.model.layers.1.mlp.up_proj.weight", "base_model.model.model.layers.1.mlp.down_proj.weight", "base_model.model.model.layers.1.input_layernorm.weight", "base_model.model.model.layers.1.post_attention_layernorm.weight", "base_model.model.model.layers.2.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.k_proj.weight", "base_model.model.model.layers.2.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.o_proj.weight", "base_model.model.model.layers.2.mlp.gate_proj.weight", "base_model.model.model.layers.2.mlp.up_proj.weight", "base_model.model.model.layers.2.mlp.down_proj.weight", "base_model.model.model.layers.2.input_layernorm.weight", "base_model.model.model.layers.2.post_attention_layernorm.weight", "base_model.model.model.layers.3.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.k_proj.weight", "base_model.model.model.layers.3.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.o_proj.weight", "base_model.model.model.layers.3.mlp.gate_proj.weight", "base_model.model.model.layers.3.mlp.up_proj.weight", "base_model.model.model.layers.3.mlp.down_proj.weight", "base_model.model.model.layers.3.input_layernorm.weight", "base_model.model.model.layers.3.post_attention_layernorm.weight", "base_model.model.model.layers.4.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.k_proj.weight", "base_model.model.model.layers.4.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.o_proj.weight", "base_model.model.model.layers.4.mlp.gate_proj.weight", "base_model.model.model.layers.4.mlp.up_proj.weight", "base_model.model.model.layers.4.mlp.down_proj.weight", "base_model.model.model.layers.4.input_layernorm.weight", "base_model.model.model.layers.4.post_attention_layernorm.weight", "base_model.model.model.layers.5.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.k_proj.weight", "base_model.model.model.layers.5.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.o_proj.weight", "base_model.model.model.layers.5.mlp.gate_proj.weight", "base_model.model.model.layers.5.mlp.up_proj.weight", "base_model.model.model.layers.5.mlp.down_proj.weight", "base_model.model.model.layers.5.input_layernorm.weight", "base_model.model.model.layers.5.post_attention_layernorm.weight", "base_model.model.model.layers.6.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.k_proj.weight", "base_model.model.model.layers.6.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.o_proj.weight", "base_model.model.model.layers.6.mlp.gate_proj.weight", "base_model.model.model.layers.6.mlp.up_proj.weight", "base_model.model.model.layers.6.mlp.down_proj.weight", "base_model.model.model.layers.6.input_layernorm.weight", "base_model.model.model.layers.6.post_attention_layernorm.weight", "base_model.model.model.layers.7.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.k_proj.weight", "base_model.model.model.layers.7.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.o_proj.weight", "base_model.model.model.layers.7.mlp.gate_proj.weight", "base_model.model.model.layers.7.mlp.up_proj.weight", "base_model.model.model.layers.7.mlp.down_proj.weight", "base_model.model.model.layers.7.input_layernorm.weight", "base_model.model.model.layers.7.post_attention_layernorm.weight", "base_model.model.model.layers.8.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.k_proj.weight", "base_model.model.model.layers.8.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.o_proj.weight", "base_model.model.model.layers.8.mlp.gate_proj.weight", "base_model.model.model.layers.8.mlp.up_proj.weight", "base_model.model.model.layers.8.mlp.down_proj.weight", "base_model.model.model.layers.8.input_layernorm.weight", "base_model.model.model.layers.8.post_attention_layernorm.weight", "base_model.model.model.layers.9.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.k_proj.weight", "base_model.model.model.layers.9.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.o_proj.weight", "base_model.model.model.layers.9.mlp.gate_proj.weight", "base_model.model.model.layers.9.mlp.up_proj.weight", "base_model.model.model.layers.9.mlp.down_proj.weight", "base_model.model.model.layers.9.input_layernorm.weight", "base_model.model.model.layers.9.post_attention_layernorm.weight", "base_model.model.model.layers.10.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.k_proj.weight", "base_model.model.model.layers.10.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.o_proj.weight", "base_model.model.model.layers.10.mlp.gate_proj.weight", "base_model.model.model.layers.10.mlp.up_proj.weight", "base_model.model.model.layers.10.mlp.down_proj.weight", "base_model.model.model.layers.10.input_layernorm.weight", "base_model.model.model.layers.10.post_attention_layernorm.weight", "base_model.model.model.layers.11.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.k_proj.weight", "base_model.model.model.layers.11.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.o_proj.weight", "base_model.model.model.layers.11.mlp.gate_proj.weight", "base_model.model.model.layers.11.mlp.up_proj.weight", "base_model.model.model.layers.11.mlp.down_proj.weight", "base_model.model.model.layers.11.input_layernorm.weight", "base_model.model.model.layers.11.post_attention_layernorm.weight", "base_model.model.model.layers.12.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.k_proj.weight", "base_model.model.model.layers.12.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.o_proj.weight", "base_model.model.model.layers.12.mlp.gate_proj.weight", "base_model.model.model.layers.12.mlp.up_proj.weight", "base_model.model.model.layers.12.mlp.down_proj.weight", "base_model.model.model.layers.12.input_layernorm.weight", "base_model.model.model.layers.12.post_attention_layernorm.weight", "base_model.model.model.layers.13.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.k_proj.weight", "base_model.model.model.layers.13.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.o_proj.weight", "base_model.model.model.layers.13.mlp.gate_proj.weight", "base_model.model.model.layers.13.mlp.up_proj.weight", "base_model.model.model.layers.13.mlp.down_proj.weight", "base_model.model.model.layers.13.input_layernorm.weight", "base_model.model.model.layers.13.post_attention_layernorm.weight", "base_model.model.model.layers.14.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.k_proj.weight", "base_model.model.model.layers.14.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.o_proj.weight", "base_model.model.model.layers.14.mlp.gate_proj.weight", "base_model.model.model.layers.14.mlp.up_proj.weight", "base_model.model.model.layers.14.mlp.down_proj.weight", "base_model.model.model.layers.14.input_layernorm.weight", "base_model.model.model.layers.14.post_attention_layernorm.weight", "base_model.model.model.layers.15.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.k_proj.weight", "base_model.model.model.layers.15.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.o_proj.weight", "base_model.model.model.layers.15.mlp.gate_proj.weight", "base_model.model.model.layers.15.mlp.up_proj.weight", "base_model.model.model.layers.15.mlp.down_proj.weight", "base_model.model.model.layers.15.input_layernorm.weight", "base_model.model.model.layers.15.post_attention_layernorm.weight", "base_model.model.model.layers.16.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.k_proj.weight", "base_model.model.model.layers.16.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.o_proj.weight", "base_model.model.model.layers.16.mlp.gate_proj.weight", "base_model.model.model.layers.16.mlp.up_proj.weight", "base_model.model.model.layers.16.mlp.down_proj.weight", "base_model.model.model.layers.16.input_layernorm.weight", "base_model.model.model.layers.16.post_attention_layernorm.weight", "base_model.model.model.layers.17.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.k_proj.weight", "base_model.model.model.layers.17.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.o_proj.weight", "base_model.model.model.layers.17.mlp.gate_proj.weight", "base_model.model.model.layers.17.mlp.up_proj.weight", "base_model.model.model.layers.17.mlp.down_proj.weight", "base_model.model.model.layers.17.input_layernorm.weight", "base_model.model.model.layers.17.post_attention_layernorm.weight", "base_model.model.model.layers.18.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.k_proj.weight", "base_model.model.model.layers.18.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.o_proj.weight", "base_model.model.model.layers.18.mlp.gate_proj.weight", "base_model.model.model.layers.18.mlp.up_proj.weight", "base_model.model.model.layers.18.mlp.down_proj.weight", "base_model.model.model.layers.18.input_layernorm.weight", "base_model.model.model.layers.18.post_attention_layernorm.weight", "base_model.model.model.layers.19.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.k_proj.weight", "base_model.model.model.layers.19.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.o_proj.weight", "base_model.model.model.layers.19.mlp.gate_proj.weight", "base_model.model.model.layers.19.mlp.up_proj.weight", "base_model.model.model.layers.19.mlp.down_proj.weight", "base_model.model.model.layers.19.input_layernorm.weight", "base_model.model.model.layers.19.post_attention_layernorm.weight", "base_model.model.model.layers.20.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.k_proj.weight", "base_model.model.model.layers.20.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.o_proj.weight", "base_model.model.model.layers.20.mlp.gate_proj.weight", "base_model.model.model.layers.20.mlp.up_proj.weight", "base_model.model.model.layers.20.mlp.down_proj.weight", "base_model.model.model.layers.20.input_layernorm.weight", "base_model.model.model.layers.20.post_attention_layernorm.weight", "base_model.model.model.layers.21.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.k_proj.weight", "base_model.model.model.layers.21.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.o_proj.weight", "base_model.model.model.layers.21.mlp.gate_proj.weight", "base_model.model.model.layers.21.mlp.up_proj.weight", "base_model.model.model.layers.21.mlp.down_proj.weight", "base_model.model.model.layers.21.input_layernorm.weight", "base_model.model.model.layers.21.post_attention_layernorm.weight", "base_model.model.model.layers.22.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.k_proj.weight", "base_model.model.model.layers.22.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.o_proj.weight", "base_model.model.model.layers.22.mlp.gate_proj.weight", "base_model.model.model.layers.22.mlp.up_proj.weight", "base_model.model.model.layers.22.mlp.down_proj.weight", "base_model.model.model.layers.22.input_layernorm.weight", "base_model.model.model.layers.22.post_attention_layernorm.weight", "base_model.model.model.layers.23.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.k_proj.weight", "base_model.model.model.layers.23.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.o_proj.weight", "base_model.model.model.layers.23.mlp.gate_proj.weight", "base_model.model.model.layers.23.mlp.up_proj.weight", "base_model.model.model.layers.23.mlp.down_proj.weight", "base_model.model.model.layers.23.input_layernorm.weight", "base_model.model.model.layers.23.post_attention_layernorm.weight", "base_model.model.model.layers.24.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.k_proj.weight", "base_model.model.model.layers.24.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.o_proj.weight", "base_model.model.model.layers.24.mlp.gate_proj.weight", "base_model.model.model.layers.24.mlp.up_proj.weight", "base_model.model.model.layers.24.mlp.down_proj.weight", "base_model.model.model.layers.24.input_layernorm.weight", "base_model.model.model.layers.24.post_attention_layernorm.weight", "base_model.model.model.layers.25.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.k_proj.weight", "base_model.model.model.layers.25.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.o_proj.weight", "base_model.model.model.layers.25.mlp.gate_proj.weight", "base_model.model.model.layers.25.mlp.up_proj.weight", "base_model.model.model.layers.25.mlp.down_proj.weight", "base_model.model.model.layers.25.input_layernorm.weight", "base_model.model.model.layers.25.post_attention_layernorm.weight", "base_model.model.model.layers.26.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.k_proj.weight", "base_model.model.model.layers.26.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.o_proj.weight", "base_model.model.model.layers.26.mlp.gate_proj.weight", "base_model.model.model.layers.26.mlp.up_proj.weight", "base_model.model.model.layers.26.mlp.down_proj.weight", "base_model.model.model.layers.26.input_layernorm.weight", "base_model.model.model.layers.26.post_attention_layernorm.weight", "base_model.model.model.layers.27.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.k_proj.weight", "base_model.model.model.layers.27.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.o_proj.weight", "base_model.model.model.layers.27.mlp.gate_proj.weight", "base_model.model.model.layers.27.mlp.up_proj.weight", "base_model.model.model.layers.27.mlp.down_proj.weight", "base_model.model.model.layers.27.input_layernorm.weight", "base_model.model.model.layers.27.post_attention_layernorm.weight", "base_model.model.model.layers.28.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.k_proj.weight", "base_model.model.model.layers.28.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.o_proj.weight", "base_model.model.model.layers.28.mlp.gate_proj.weight", "base_model.model.model.layers.28.mlp.up_proj.weight", "base_model.model.model.layers.28.mlp.down_proj.weight", "base_model.model.model.layers.28.input_layernorm.weight", "base_model.model.model.layers.28.post_attention_layernorm.weight", "base_model.model.model.layers.29.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.k_proj.weight", "base_model.model.model.layers.29.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.o_proj.weight", "base_model.model.model.layers.29.mlp.gate_proj.weight", "base_model.model.model.layers.29.mlp.up_proj.weight", "base_model.model.model.layers.29.mlp.down_proj.weight", "base_model.model.model.layers.29.input_layernorm.weight", "base_model.model.model.layers.29.post_attention_layernorm.weight", "base_model.model.model.layers.30.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.k_proj.weight", "base_model.model.model.layers.30.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.o_proj.weight", "base_model.model.model.layers.30.mlp.gate_proj.weight", "base_model.model.model.layers.30.mlp.up_proj.weight", "base_model.model.model.layers.30.mlp.down_proj.weight", "base_model.model.model.layers.30.input_layernorm.weight", "base_model.model.model.layers.30.post_attention_layernorm.weight", "base_model.model.model.layers.31.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.k_proj.weight", "base_model.model.model.layers.31.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.o_proj.weight", "base_model.model.model.layers.31.mlp.gate_proj.weight", "base_model.model.model.layers.31.mlp.up_proj.weight", "base_model.model.model.layers.31.mlp.down_proj.weight", "base_model.model.model.layers.31.input_layernorm.weight", "base_model.model.model.layers.31.post_attention_layernorm.weight", "base_model.model.model.norm.weight", "base_model.model.lm_head.weight".
I also tried converting the weights by the script zero_to_fp32.py but still got the same error
Expected behavior
Training should be resumed without error
I also am having this problem, but it was caused by trainer.train()
Hi everyone, not sure if this is related but this might be fixed on peft main: https://github.com/huggingface/transformers/issues/28770#issuecomment-1935819776 see this comment from @pacman100
This problem still persists even after updating to the latest version
Hi everyone, not sure if this is related but this might be fixed on peft main: huggingface/transformers#28770 (comment) see this comment from @pacman100
The problem seems to arise when using Peft+Deepspeed, even when training on only 1 GPU. My code would have no problem when running without distributed training.
home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
Traceback (most recent call last):
File "/home/tung/development/llm-data-generator/llm_finetune.py", line 139, in <module>
trainer.train()
File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/trainer.py", line 1972, in _inner_training_loop
self._load_best_model()
File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/trainer.py", line 2168, in _load_best_model
deepspeed_load_checkpoint(self.model_wrapped, self.state.best_model_checkpoint)
File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint
load_path, _ = deepspeed_engine.load_checkpoint(
File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2740, in load_checkpoint
load_path, client_states = self._load_checkpoint(load_dir,
File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2825, in _load_checkpoint
self.load_module_state_dict(checkpoint=checkpoint,
File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2603, in load_module_state_dict
self.module.load_state_dict(
File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
Missing key(s) in state_dict: "base_model.model.model.embed_tokens.weight", "base_model.model.model.layers.0.input_layernorm.weight", "base_model.model.model.layers.0.post_attention_layernorm.weight", "base_model.model.model.layers.1.input_layernorm.weight", "base_model.model.model.layers.1.post_attention_layernorm.weight", "base_model.model.model.layers.2.input_layernorm.weight", "base_model.model.model.layers.2.post_attention_layernorm.weight", "base_model.model.model.layers.3.input_layernorm.weight", "base_model.model.model.layers.3.post_attention_layernorm.weight", "base_model.model.model.layers.4.input_layernorm.weight", "base_model.model.model.layers.4.post_attention_layernorm.weight", "base_model.model.model.layers.5.input_layernorm.weight", "base_model.model.model.layers.5.post_attention_layernorm.weight", "base_model.model.model.layers.6.input_layernorm.weight", "base_model.model.model.layers.6.post_attention_layernorm.weight", "base_model.model.model.layers.7.input_layernorm.weight", "base_model.model.model.layers.7.post_attention_layernorm.weight", "base_model.model.model.layers.8.input_layernorm.weight", "base_model.model.model.layers.8.post_attention_layernorm.weight", "base_model.model.model.layers.9.input_layernorm.weight", "base_model.model.model.layers.9.post_attention_layernorm.weight", "base_model.model.model.layers.10.input_layernorm.weight", "base_model.model.model.layers.10.post_attention_layernorm.weight", "base_model.model.model.layers.11.input_layernorm.weight", "base_model.model.model.layers.11.post_attention_layernorm.weight", "base_model.model.model.layers.12.input_layernorm.weight", "base_model.model.model.layers.12.post_attention_layernorm.weight", "base_model.model.model.layers.13.input_layernorm.weight", "base_model.model.model.layers.13.post_attention_layernorm.weight", "base_model.model.model.layers.14.input_layernorm.weight", "base_model.model.model.layers.14.post_attention_layernorm.weight", "base_model.model.model.layers.15.input_layernorm.weight", "base_model.model.model.layers.15.post_attention_layernorm.weight", "base_model.model.model.layers.16.input_layernorm.weight", "base_model.model.model.layers.16.post_attention_layernorm.weight", "base_model.model.model.layers.17.input_layernorm.weight", "base_model.model.model.layers.17.post_attention_layernorm.weight", "base_model.model.model.layers.18.input_layernorm.weight", "base_model.model.model.layers.18.post_attention_layernorm.weight", "base_model.model.model.layers.19.input_layernorm.weight", "base_model.model.model.layers.19.post_attention_layernorm.weight", "base_model.model.model.layers.20.input_layernorm.weight", "base_model.model.model.layers.20.post_attention_layernorm.weight", "base_model.model.model.layers.21.input_layernorm.weight", "base_model.model.model.layers.21.post_attention_layernorm.weight", "base_model.model.model.layers.22.input_layernorm.weight", "base_model.model.model.layers.22.post_attention_layernorm.weight", "base_model.model.model.layers.23.input_layernorm.weight", "base_model.model.model.layers.23.post_attention_layernorm.weight", "base_model.model.model.layers.24.input_layernorm.weight", "base_model.model.model.layers.24.post_attention_layernorm.weight", "base_model.model.model.layers.25.input_layernorm.weight", "base_model.model.model.layers.25.post_attention_layernorm.weight", "base_model.model.model.layers.26.input_layernorm.weight", "base_model.model.model.layers.26.post_attention_layernorm.weight", "base_model.model.model.layers.27.input_layernorm.weight", "base_model.model.model.layers.27.post_attention_layernorm.weight", "base_model.model.model.layers.28.input_layernorm.weight", "base_model.model.model.layers.28.post_attention_layernorm.weight", "base_model.model.model.layers.29.input_layernorm.weight", "base_model.model.model.layers.29.post_attention_layernorm.weight", "base_model.model.model.layers.30.input_layernorm.weight", "base_model.model.model.layers.30.post_attention_layernorm.weight", "base_model.model.model.layers.31.input_layernorm.weight", "base_model.model.model.layers.31.post_attention_layernorm.weight",
"base_model.model.model.norm.weight",
"base_model.model.lm_head.weight".
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3
Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3
Do you use DS zero 3 with 4 bits quantization? AFAIK they dont work together. I was able to fix this issue with go down to DS zero 2 and update transformers to version >= 4.38.2
Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3
Do you use DS zero 3 with 4 bits quantization? AFAIK they dont work together. I was able to fix this issue with go down to DS zero 2 and update transformers to version >= 4.38.2
DS-zero-3 but without any quantization
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3
Do you use DS zero 3 with 4 bits quantization? AFAIK they dont work together. I was able to fix this issue with go down to DS zero 2 and update transformers to version >= 4.38.2
Me too, If only stage 2 optimization is used instead of up to three, I can load my model without missing key. The model can not be loaded due to missing keys if I want to utilize stage 3 optimization. However, stage 3 seems to provide much boost in speed. Hope this bug can be solved soon!
i solved this problem under deepspeed zero3 by changing the environment:
transformers==4.38.2, pydantic==1.9.0, accelerate==0.27.2
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
@ktlKTL using your package version, got new error: ValueError: Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([0])), this look incorrect.
Any hints?
i solved this problem under deepspeed zero3 by changing the environment:
transformers==4.38.2, pydantic==1.9.0, accelerate==0.27.2
Great! It solves my problem perfectly! Thanks for providing the solution.