peft RunTimeError Missing keys while resuming training and cannot load checkpoint

System Info

peft==0.8.1 accelerate==0.26.1 transformers==4.37.1 deepspeed==0.13.1

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder
[X] My own task or dataset (give details below)

Reproduction

I have trained a Mistral model with the deepspeed zero 3 configuration, now I want to resume from the checkpoint, then I am getting this error

RuntimeError                              Traceback (most recent call last)
Cell In[4], [line 136](vscode-notebook-cell:?execution_count=4&line=136)
    [131](vscode-notebook-cell:?execution_count=4&line=131)     trainer.save_metrics("test", test_metrics)
    [135](vscode-notebook-cell:?execution_count=4&line=135) if __name__ == "__main__":
--> [136](vscode-notebook-cell:?execution_count=4&line=136)     main(cfg)

Cell In[4], [line 114](vscode-notebook-cell:?execution_count=4&line=114)
    [110](vscode-notebook-cell:?execution_count=4&line=110) model.state_dict = (
    [111](vscode-notebook-cell:?execution_count=4&line=111)     lambda self, *_, **__: get_peft_model_state_dict(self, old_state_dict())
    [112](vscode-notebook-cell:?execution_count=4&line=112) ).__get__(model, type(model))
    [113](vscode-notebook-cell:?execution_count=4&line=113) # start training
--> [114](vscode-notebook-cell:?execution_count=4&line=114) train_result = trainer.train(resume_from_checkpoint=True)
    [115](vscode-notebook-cell:?execution_count=4&line=115) trainer.save_model(training_dir_name)
    [116](vscode-notebook-cell:?execution_count=4&line=116) model.save_pretrained(training_dir_name)

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1539](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1539), in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   [1537](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1537)         hf_hub_utils.enable_progress_bars()
   [1538](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1538) else:
-> [1539](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1539)     return inner_training_loop(
   [1540](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1540)         args=args,
   [1541](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1541)         resume_from_checkpoint=resume_from_checkpoint,
   [1542](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1542)         trial=trial,
   [1543](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1543)         ignore_keys_for_eval=ignore_keys_for_eval,
   [1544](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1544)     )

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1708](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1708), in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   [1706](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1706) if resume_from_checkpoint is not None:
   [1707](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1707)     if self.is_deepspeed_enabled:
-> [1708](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1708)         deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint)
   [1709](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1709)     elif is_sagemaker_mp_enabled() or self.is_fsdp_enabled:
   [1710](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1710)         self._load_from_checkpoint(resume_from_checkpoint, self.model_wrapped)

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:402](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:402), in deepspeed_load_checkpoint(deepspeed_engine, checkpoint_path)
    [400](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:400) logger.info(f"Attempting to resume from {checkpoint_path}")
    [401](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:401) # this magically updates self.optimizer and self.lr_scheduler
--> [402](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:402) load_path, _ = deepspeed_engine.load_checkpoint(
    [403](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:403)     checkpoint_path, load_optimizer_states=True, load_lr_scheduler_states=True
    [404](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:404) )
    [405](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:405) if load_path is None:
    [406](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:406)     raise ValueError(f"[deepspeed] failed to resume from checkpoint {checkpoint_path}")

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2740](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2740), in DeepSpeedEngine.load_checkpoint(self, load_dir, tag, load_module_strict, load_optimizer_states, load_lr_scheduler_states, load_module_only, custom_load_fn)
   [2736](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2736) if self._optimizer_has_ckpt_event_prologue():
   [2737](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2737)     # Prepare for checkpoint load by ensuring all parameters are partitioned
   [2738](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2738)     self.optimizer.checkpoint_event_prologue()
-> [2740](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2740) load_path, client_states = self._load_checkpoint(load_dir,
   [2741](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2741)                                                  tag,
   [2742](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2742)                                                  load_module_strict=load_module_strict,
   [2743](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2743)                                                  load_optimizer_states=load_optimizer_states,
   [2744](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2744)                                                  load_lr_scheduler_states=load_lr_scheduler_states,
   [2745](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2745)                                                  load_module_only=load_module_only,
   [2746](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2746)                                                  custom_load_fn=custom_load_fn)
   [2748](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2748) load_zero_checkpoint = load_path is not None and (self.zero_optimization() or self.bfloat16_enabled())
   [2749](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2749) if load_zero_checkpoint:

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2825](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2825), in DeepSpeedEngine._load_checkpoint(self, load_dir, tag, load_module_strict, load_optimizer_states, load_lr_scheduler_states, load_module_only, custom_load_fn)
   [2816](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2816)     DeepSpeedEngine.load_moe_state_dict(load_dir,
   [2817](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2817)                                         tag,
   [2818](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2818)                                         state_dict=checkpoint['module'],
   (...)
   [2822](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2822)                                         num_experts=self.num_experts,
   [2823](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2823)                                         checkpoint_engine=self.checkpoint_engine)
   [2824](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2824) if not self.load_universal_checkpoint():
-> [2825](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2825)     self.load_module_state_dict(checkpoint=checkpoint,
   [2826](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2826)                                 strict=load_module_strict,
   [2827](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2827)                                 custom_load_fn=custom_load_fn,
   [2828](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2828)                                 fetch_z3_params=fetch_z3_params)
   [2830](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2830) self.loaded_checkpoint_dp_world_size = checkpoint['dp_world_size']
   [2832](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2832) optim_checkpoint = None

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2603](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2603), in DeepSpeedEngine.load_module_state_dict(self, checkpoint, strict, custom_load_fn, fetch_z3_params)
   [2601](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2601)         custom_load_fn(src=module_state_dict, dst=self.module)
   [2602](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2602)     else:
-> [2603](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2603)         self.module.load_state_dict(
   [2604](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2604)             module_state_dict,  # TODO
   [2605](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2605)             strict=strict)
   [2607](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2607) if checkpoint.get(FROZEN_PARAM_FRAGMENTS, None) is not None:
   [2608](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2608)     saved_frozen_params = checkpoint[FROZEN_PARAM_FRAGMENTS]

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152), in Module.load_state_dict(self, state_dict, strict, assign)
   [2147](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2147)         error_msgs.insert(
   [2148](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2148)             0, 'Missing key(s) in state_dict: {}. '.format(
   [2149](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2149)                 ', '.join(f'"{k}"' for k in missing_keys)))
   [2151](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2151) if len(error_msgs) > 0:
-> [2152](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152)     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
   [2153](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2153)                        self.__class__.__name__, "\n\t".join(error_msgs)))
   [2154](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2154) return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
	Missing key(s) in state_dict: "base_model.model.model.embed_tokens.weight", "base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.k_proj.weight", "base_model.model.model.layers.0.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.o_proj.weight", "base_model.model.model.layers.0.mlp.gate_proj.weight", "base_model.model.model.layers.0.mlp.up_proj.weight", "base_model.model.model.layers.0.mlp.down_proj.weight", "base_model.model.model.layers.0.input_layernorm.weight", "base_model.model.model.layers.0.post_attention_layernorm.weight", "base_model.model.model.layers.1.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.k_proj.weight", "base_model.model.model.layers.1.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.o_proj.weight", "base_model.model.model.layers.1.mlp.gate_proj.weight", "base_model.model.model.layers.1.mlp.up_proj.weight", "base_model.model.model.layers.1.mlp.down_proj.weight", "base_model.model.model.layers.1.input_layernorm.weight", "base_model.model.model.layers.1.post_attention_layernorm.weight", "base_model.model.model.layers.2.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.k_proj.weight", "base_model.model.model.layers.2.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.o_proj.weight", "base_model.model.model.layers.2.mlp.gate_proj.weight", "base_model.model.model.layers.2.mlp.up_proj.weight", "base_model.model.model.layers.2.mlp.down_proj.weight", "base_model.model.model.layers.2.input_layernorm.weight", "base_model.model.model.layers.2.post_attention_layernorm.weight", "base_model.model.model.layers.3.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.k_proj.weight", "base_model.model.model.layers.3.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.o_proj.weight", "base_model.model.model.layers.3.mlp.gate_proj.weight", "base_model.model.model.layers.3.mlp.up_proj.weight", "base_model.model.model.layers.3.mlp.down_proj.weight", "base_model.model.model.layers.3.input_layernorm.weight", "base_model.model.model.layers.3.post_attention_layernorm.weight", "base_model.model.model.layers.4.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.k_proj.weight", "base_model.model.model.layers.4.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.o_proj.weight", "base_model.model.model.layers.4.mlp.gate_proj.weight", "base_model.model.model.layers.4.mlp.up_proj.weight", "base_model.model.model.layers.4.mlp.down_proj.weight", "base_model.model.model.layers.4.input_layernorm.weight", "base_model.model.model.layers.4.post_attention_layernorm.weight", "base_model.model.model.layers.5.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.k_proj.weight", "base_model.model.model.layers.5.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.o_proj.weight", "base_model.model.model.layers.5.mlp.gate_proj.weight", "base_model.model.model.layers.5.mlp.up_proj.weight", "base_model.model.model.layers.5.mlp.down_proj.weight", "base_model.model.model.layers.5.input_layernorm.weight", "base_model.model.model.layers.5.post_attention_layernorm.weight", "base_model.model.model.layers.6.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.k_proj.weight", "base_model.model.model.layers.6.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.o_proj.weight", "base_model.model.model.layers.6.mlp.gate_proj.weight", "base_model.model.model.layers.6.mlp.up_proj.weight", "base_model.model.model.layers.6.mlp.down_proj.weight", "base_model.model.model.layers.6.input_layernorm.weight", "base_model.model.model.layers.6.post_attention_layernorm.weight", "base_model.model.model.layers.7.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.k_proj.weight", "base_model.model.model.layers.7.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.o_proj.weight", "base_model.model.model.layers.7.mlp.gate_proj.weight", "base_model.model.model.layers.7.mlp.up_proj.weight", "base_model.model.model.layers.7.mlp.down_proj.weight", "base_model.model.model.layers.7.input_layernorm.weight", "base_model.model.model.layers.7.post_attention_layernorm.weight", "base_model.model.model.layers.8.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.k_proj.weight", "base_model.model.model.layers.8.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.o_proj.weight", "base_model.model.model.layers.8.mlp.gate_proj.weight", "base_model.model.model.layers.8.mlp.up_proj.weight", "base_model.model.model.layers.8.mlp.down_proj.weight", "base_model.model.model.layers.8.input_layernorm.weight", "base_model.model.model.layers.8.post_attention_layernorm.weight", "base_model.model.model.layers.9.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.k_proj.weight", "base_model.model.model.layers.9.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.o_proj.weight", "base_model.model.model.layers.9.mlp.gate_proj.weight", "base_model.model.model.layers.9.mlp.up_proj.weight", "base_model.model.model.layers.9.mlp.down_proj.weight", "base_model.model.model.layers.9.input_layernorm.weight", "base_model.model.model.layers.9.post_attention_layernorm.weight", "base_model.model.model.layers.10.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.k_proj.weight", "base_model.model.model.layers.10.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.o_proj.weight", "base_model.model.model.layers.10.mlp.gate_proj.weight", "base_model.model.model.layers.10.mlp.up_proj.weight", "base_model.model.model.layers.10.mlp.down_proj.weight", "base_model.model.model.layers.10.input_layernorm.weight", "base_model.model.model.layers.10.post_attention_layernorm.weight", "base_model.model.model.layers.11.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.k_proj.weight", "base_model.model.model.layers.11.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.o_proj.weight", "base_model.model.model.layers.11.mlp.gate_proj.weight", "base_model.model.model.layers.11.mlp.up_proj.weight", "base_model.model.model.layers.11.mlp.down_proj.weight", "base_model.model.model.layers.11.input_layernorm.weight", "base_model.model.model.layers.11.post_attention_layernorm.weight", "base_model.model.model.layers.12.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.k_proj.weight", "base_model.model.model.layers.12.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.o_proj.weight", "base_model.model.model.layers.12.mlp.gate_proj.weight", "base_model.model.model.layers.12.mlp.up_proj.weight", "base_model.model.model.layers.12.mlp.down_proj.weight", "base_model.model.model.layers.12.input_layernorm.weight", "base_model.model.model.layers.12.post_attention_layernorm.weight", "base_model.model.model.layers.13.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.k_proj.weight", "base_model.model.model.layers.13.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.o_proj.weight", "base_model.model.model.layers.13.mlp.gate_proj.weight", "base_model.model.model.layers.13.mlp.up_proj.weight", "base_model.model.model.layers.13.mlp.down_proj.weight", "base_model.model.model.layers.13.input_layernorm.weight", "base_model.model.model.layers.13.post_attention_layernorm.weight", "base_model.model.model.layers.14.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.k_proj.weight", "base_model.model.model.layers.14.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.o_proj.weight", "base_model.model.model.layers.14.mlp.gate_proj.weight", "base_model.model.model.layers.14.mlp.up_proj.weight", "base_model.model.model.layers.14.mlp.down_proj.weight", "base_model.model.model.layers.14.input_layernorm.weight", "base_model.model.model.layers.14.post_attention_layernorm.weight", "base_model.model.model.layers.15.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.k_proj.weight", "base_model.model.model.layers.15.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.o_proj.weight", "base_model.model.model.layers.15.mlp.gate_proj.weight", "base_model.model.model.layers.15.mlp.up_proj.weight", "base_model.model.model.layers.15.mlp.down_proj.weight", "base_model.model.model.layers.15.input_layernorm.weight", "base_model.model.model.layers.15.post_attention_layernorm.weight", "base_model.model.model.layers.16.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.k_proj.weight", "base_model.model.model.layers.16.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.o_proj.weight", "base_model.model.model.layers.16.mlp.gate_proj.weight", "base_model.model.model.layers.16.mlp.up_proj.weight", "base_model.model.model.layers.16.mlp.down_proj.weight", "base_model.model.model.layers.16.input_layernorm.weight", "base_model.model.model.layers.16.post_attention_layernorm.weight", "base_model.model.model.layers.17.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.k_proj.weight", "base_model.model.model.layers.17.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.o_proj.weight", "base_model.model.model.layers.17.mlp.gate_proj.weight", "base_model.model.model.layers.17.mlp.up_proj.weight", "base_model.model.model.layers.17.mlp.down_proj.weight", "base_model.model.model.layers.17.input_layernorm.weight", "base_model.model.model.layers.17.post_attention_layernorm.weight", "base_model.model.model.layers.18.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.k_proj.weight", "base_model.model.model.layers.18.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.o_proj.weight", "base_model.model.model.layers.18.mlp.gate_proj.weight", "base_model.model.model.layers.18.mlp.up_proj.weight", "base_model.model.model.layers.18.mlp.down_proj.weight", "base_model.model.model.layers.18.input_layernorm.weight", "base_model.model.model.layers.18.post_attention_layernorm.weight", "base_model.model.model.layers.19.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.k_proj.weight", "base_model.model.model.layers.19.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.o_proj.weight", "base_model.model.model.layers.19.mlp.gate_proj.weight", "base_model.model.model.layers.19.mlp.up_proj.weight", "base_model.model.model.layers.19.mlp.down_proj.weight", "base_model.model.model.layers.19.input_layernorm.weight", "base_model.model.model.layers.19.post_attention_layernorm.weight", "base_model.model.model.layers.20.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.k_proj.weight", "base_model.model.model.layers.20.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.o_proj.weight", "base_model.model.model.layers.20.mlp.gate_proj.weight", "base_model.model.model.layers.20.mlp.up_proj.weight", "base_model.model.model.layers.20.mlp.down_proj.weight", "base_model.model.model.layers.20.input_layernorm.weight", "base_model.model.model.layers.20.post_attention_layernorm.weight", "base_model.model.model.layers.21.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.k_proj.weight", "base_model.model.model.layers.21.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.o_proj.weight", "base_model.model.model.layers.21.mlp.gate_proj.weight", "base_model.model.model.layers.21.mlp.up_proj.weight", "base_model.model.model.layers.21.mlp.down_proj.weight", "base_model.model.model.layers.21.input_layernorm.weight", "base_model.model.model.layers.21.post_attention_layernorm.weight", "base_model.model.model.layers.22.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.k_proj.weight", "base_model.model.model.layers.22.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.o_proj.weight", "base_model.model.model.layers.22.mlp.gate_proj.weight", "base_model.model.model.layers.22.mlp.up_proj.weight", "base_model.model.model.layers.22.mlp.down_proj.weight", "base_model.model.model.layers.22.input_layernorm.weight", "base_model.model.model.layers.22.post_attention_layernorm.weight", "base_model.model.model.layers.23.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.k_proj.weight", "base_model.model.model.layers.23.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.o_proj.weight", "base_model.model.model.layers.23.mlp.gate_proj.weight", "base_model.model.model.layers.23.mlp.up_proj.weight", "base_model.model.model.layers.23.mlp.down_proj.weight", "base_model.model.model.layers.23.input_layernorm.weight", "base_model.model.model.layers.23.post_attention_layernorm.weight", "base_model.model.model.layers.24.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.k_proj.weight", "base_model.model.model.layers.24.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.o_proj.weight", "base_model.model.model.layers.24.mlp.gate_proj.weight", "base_model.model.model.layers.24.mlp.up_proj.weight", "base_model.model.model.layers.24.mlp.down_proj.weight", "base_model.model.model.layers.24.input_layernorm.weight", "base_model.model.model.layers.24.post_attention_layernorm.weight", "base_model.model.model.layers.25.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.k_proj.weight", "base_model.model.model.layers.25.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.o_proj.weight", "base_model.model.model.layers.25.mlp.gate_proj.weight", "base_model.model.model.layers.25.mlp.up_proj.weight", "base_model.model.model.layers.25.mlp.down_proj.weight", "base_model.model.model.layers.25.input_layernorm.weight", "base_model.model.model.layers.25.post_attention_layernorm.weight", "base_model.model.model.layers.26.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.k_proj.weight", "base_model.model.model.layers.26.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.o_proj.weight", "base_model.model.model.layers.26.mlp.gate_proj.weight", "base_model.model.model.layers.26.mlp.up_proj.weight", "base_model.model.model.layers.26.mlp.down_proj.weight", "base_model.model.model.layers.26.input_layernorm.weight", "base_model.model.model.layers.26.post_attention_layernorm.weight", "base_model.model.model.layers.27.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.k_proj.weight", "base_model.model.model.layers.27.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.o_proj.weight", "base_model.model.model.layers.27.mlp.gate_proj.weight", "base_model.model.model.layers.27.mlp.up_proj.weight", "base_model.model.model.layers.27.mlp.down_proj.weight", "base_model.model.model.layers.27.input_layernorm.weight", "base_model.model.model.layers.27.post_attention_layernorm.weight", "base_model.model.model.layers.28.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.k_proj.weight", "base_model.model.model.layers.28.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.o_proj.weight", "base_model.model.model.layers.28.mlp.gate_proj.weight", "base_model.model.model.layers.28.mlp.up_proj.weight", "base_model.model.model.layers.28.mlp.down_proj.weight", "base_model.model.model.layers.28.input_layernorm.weight", "base_model.model.model.layers.28.post_attention_layernorm.weight", "base_model.model.model.layers.29.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.k_proj.weight", "base_model.model.model.layers.29.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.o_proj.weight", "base_model.model.model.layers.29.mlp.gate_proj.weight", "base_model.model.model.layers.29.mlp.up_proj.weight", "base_model.model.model.layers.29.mlp.down_proj.weight", "base_model.model.model.layers.29.input_layernorm.weight", "base_model.model.model.layers.29.post_attention_layernorm.weight", "base_model.model.model.layers.30.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.k_proj.weight", "base_model.model.model.layers.30.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.o_proj.weight", "base_model.model.model.layers.30.mlp.gate_proj.weight", "base_model.model.model.layers.30.mlp.up_proj.weight", "base_model.model.model.layers.30.mlp.down_proj.weight", "base_model.model.model.layers.30.input_layernorm.weight", "base_model.model.model.layers.30.post_attention_layernorm.weight", "base_model.model.model.layers.31.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.k_proj.weight", "base_model.model.model.layers.31.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.o_proj.weight", "base_model.model.model.layers.31.mlp.gate_proj.weight", "base_model.model.model.layers.31.mlp.up_proj.weight", "base_model.model.model.layers.31.mlp.down_proj.weight", "base_model.model.model.layers.31.input_layernorm.weight", "base_model.model.model.layers.31.post_attention_layernorm.weight", "base_model.model.model.norm.weight", "base_model.model.lm_head.weight".

I also tried converting the weights by the script zero_to_fp32.py but still got the same error

Expected behavior

Training should be resumed without error

Feb 11 '24 07:02 seanbenhur

I also am having this problem, but it was caused by trainer.train()

Feb 13 '24 01:02 tungsontran

Hi everyone, not sure if this is related but this might be fixed on peft main: https://github.com/huggingface/transformers/issues/28770#issuecomment-1935819776 see this comment from @pacman100

Feb 13 '24 01:02 younesbelkada

This problem still persists even after updating to the latest version

Hi everyone, not sure if this is related but this might be fixed on peft main: huggingface/transformers#28770 (comment) see this comment from @pacman100

Feb 13 '24 13:02 tungsontran

The problem seems to arise when using Peft+Deepspeed, even when training on only 1 GPU. My code would have no problem when running without distributed training.

home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
Traceback (most recent call last):
  File "/home/tung/development/llm-data-generator/llm_finetune.py", line 139, in <module>
    trainer.train()
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/trainer.py", line 1972, in _inner_training_loop
    self._load_best_model()
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/trainer.py", line 2168, in _load_best_model
    deepspeed_load_checkpoint(self.model_wrapped, self.state.best_model_checkpoint)
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint
    load_path, _ = deepspeed_engine.load_checkpoint(
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2740, in load_checkpoint
    load_path, client_states = self._load_checkpoint(load_dir,
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2825, in _load_checkpoint
    self.load_module_state_dict(checkpoint=checkpoint,
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2603, in load_module_state_dict
    self.module.load_state_dict(
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        Missing key(s) in state_dict: "base_model.model.model.embed_tokens.weight", "base_model.model.model.layers.0.input_layernorm.weight", "base_model.model.model.layers.0.post_attention_layernorm.weight", "base_model.model.model.layers.1.input_layernorm.weight", "base_model.model.model.layers.1.post_attention_layernorm.weight", "base_model.model.model.layers.2.input_layernorm.weight", "base_model.model.model.layers.2.post_attention_layernorm.weight", "base_model.model.model.layers.3.input_layernorm.weight", "base_model.model.model.layers.3.post_attention_layernorm.weight", "base_model.model.model.layers.4.input_layernorm.weight", "base_model.model.model.layers.4.post_attention_layernorm.weight", "base_model.model.model.layers.5.input_layernorm.weight", "base_model.model.model.layers.5.post_attention_layernorm.weight", "base_model.model.model.layers.6.input_layernorm.weight", "base_model.model.model.layers.6.post_attention_layernorm.weight", "base_model.model.model.layers.7.input_layernorm.weight", "base_model.model.model.layers.7.post_attention_layernorm.weight", "base_model.model.model.layers.8.input_layernorm.weight", "base_model.model.model.layers.8.post_attention_layernorm.weight", "base_model.model.model.layers.9.input_layernorm.weight", "base_model.model.model.layers.9.post_attention_layernorm.weight", "base_model.model.model.layers.10.input_layernorm.weight", "base_model.model.model.layers.10.post_attention_layernorm.weight", "base_model.model.model.layers.11.input_layernorm.weight", "base_model.model.model.layers.11.post_attention_layernorm.weight", "base_model.model.model.layers.12.input_layernorm.weight", "base_model.model.model.layers.12.post_attention_layernorm.weight", "base_model.model.model.layers.13.input_layernorm.weight", "base_model.model.model.layers.13.post_attention_layernorm.weight", "base_model.model.model.layers.14.input_layernorm.weight", "base_model.model.model.layers.14.post_attention_layernorm.weight", "base_model.model.model.layers.15.input_layernorm.weight", "base_model.model.model.layers.15.post_attention_layernorm.weight", "base_model.model.model.layers.16.input_layernorm.weight", "base_model.model.model.layers.16.post_attention_layernorm.weight", "base_model.model.model.layers.17.input_layernorm.weight", "base_model.model.model.layers.17.post_attention_layernorm.weight", "base_model.model.model.layers.18.input_layernorm.weight", "base_model.model.model.layers.18.post_attention_layernorm.weight", "base_model.model.model.layers.19.input_layernorm.weight", "base_model.model.model.layers.19.post_attention_layernorm.weight", "base_model.model.model.layers.20.input_layernorm.weight", "base_model.model.model.layers.20.post_attention_layernorm.weight", "base_model.model.model.layers.21.input_layernorm.weight", "base_model.model.model.layers.21.post_attention_layernorm.weight", "base_model.model.model.layers.22.input_layernorm.weight", "base_model.model.model.layers.22.post_attention_layernorm.weight", "base_model.model.model.layers.23.input_layernorm.weight", "base_model.model.model.layers.23.post_attention_layernorm.weight", "base_model.model.model.layers.24.input_layernorm.weight", "base_model.model.model.layers.24.post_attention_layernorm.weight", "base_model.model.model.layers.25.input_layernorm.weight", "base_model.model.model.layers.25.post_attention_layernorm.weight", "base_model.model.model.layers.26.input_layernorm.weight", "base_model.model.model.layers.26.post_attention_layernorm.weight", "base_model.model.model.layers.27.input_layernorm.weight", "base_model.model.model.layers.27.post_attention_layernorm.weight", "base_model.model.model.layers.28.input_layernorm.weight", "base_model.model.model.layers.28.post_attention_layernorm.weight", "base_model.model.model.layers.29.input_layernorm.weight", "base_model.model.model.layers.29.post_attention_layernorm.weight", "base_model.model.model.layers.30.input_layernorm.weight", "base_model.model.model.layers.30.post_attention_layernorm.weight", "base_model.model.model.layers.31.input_layernorm.weight", "base_model.model.model.layers.31.post_attention_layernorm.weight",
"base_model.model.model.norm.weight", 
"base_model.model.lm_head.weight".

Feb 14 '24 11:02 tungsontran

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Mar 12 '24 15:03 github-actions[bot]

Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3

Mar 15 '24 09:03 iarbel84

Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3

Do you use DS zero 3 with 4 bits quantization? AFAIK they dont work together. I was able to fix this issue with go down to DS zero 2 and update transformers to version >= 4.38.2

Mar 15 '24 23:03 tungsontran

Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3

Do you use DS zero 3 with 4 bits quantization? AFAIK they dont work together. I was able to fix this issue with go down to DS zero 2 and update transformers to version >= 4.38.2

DS-zero-3 but without any quantization

Mar 16 '24 12:03 iarbel84

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Apr 09 '24 15:04 github-actions[bot]

Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3

Do you use DS zero 3 with 4 bits quantization? AFAIK they dont work together. I was able to fix this issue with go down to DS zero 2 and update transformers to version >= 4.38.2

Me too, If only stage 2 optimization is used instead of up to three, I can load my model without missing key. The model can not be loaded due to missing keys if I want to utilize stage 3 optimization. However, stage 3 seems to provide much boost in speed. Hope this bug can be solved soon!

Apr 10 '24 05:04 MagicianWu

i solved this problem under deepspeed zero3 by changing the environment:

transformers==4.38.2, pydantic==1.9.0, accelerate==0.27.2

Apr 12 '24 08:04 ktlKTL

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

May 06 '24 15:05 github-actions[bot]

@ktlKTL using your package version, got new error: ValueError: Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([0])), this look incorrect.

Any hints?

Jul 15 '24 20:07 Andcircle

i solved this problem under deepspeed zero3 by changing the environment:

transformers==4.38.2, pydantic==1.9.0, accelerate==0.27.2

Great! It solves my problem perfectly! Thanks for providing the solution.

Aug 03 '24 03:08 JunJieYa

peft peft copied to clipboard

RunTimeError Missing keys while resuming training and cannot load checkpoint

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

peft
peft copied to clipboard