peft icon indicating copy to clipboard operation
peft copied to clipboard

RunTimeError Missing keys while resuming training and cannot load checkpoint

Open seanbenhur opened this issue 1 year ago • 12 comments

System Info

peft==0.8.1 accelerate==0.26.1 transformers==4.37.1 deepspeed==0.13.1

Who can help?

No response

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder
  • [X] My own task or dataset (give details below)

Reproduction

I have trained a Mistral model with the deepspeed zero 3 configuration, now I want to resume from the checkpoint, then I am getting this error

RuntimeError                              Traceback (most recent call last)
Cell In[4], [line 136](vscode-notebook-cell:?execution_count=4&line=136)
    [131](vscode-notebook-cell:?execution_count=4&line=131)     trainer.save_metrics("test", test_metrics)
    [135](vscode-notebook-cell:?execution_count=4&line=135) if __name__ == "__main__":
--> [136](vscode-notebook-cell:?execution_count=4&line=136)     main(cfg)

Cell In[4], [line 114](vscode-notebook-cell:?execution_count=4&line=114)
    [110](vscode-notebook-cell:?execution_count=4&line=110) model.state_dict = (
    [111](vscode-notebook-cell:?execution_count=4&line=111)     lambda self, *_, **__: get_peft_model_state_dict(self, old_state_dict())
    [112](vscode-notebook-cell:?execution_count=4&line=112) ).__get__(model, type(model))
    [113](vscode-notebook-cell:?execution_count=4&line=113) # start training
--> [114](vscode-notebook-cell:?execution_count=4&line=114) train_result = trainer.train(resume_from_checkpoint=True)
    [115](vscode-notebook-cell:?execution_count=4&line=115) trainer.save_model(training_dir_name)
    [116](vscode-notebook-cell:?execution_count=4&line=116) model.save_pretrained(training_dir_name)

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1539](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1539), in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   [1537](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1537)         hf_hub_utils.enable_progress_bars()
   [1538](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1538) else:
-> [1539](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1539)     return inner_training_loop(
   [1540](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1540)         args=args,
   [1541](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1541)         resume_from_checkpoint=resume_from_checkpoint,
   [1542](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1542)         trial=trial,
   [1543](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1543)         ignore_keys_for_eval=ignore_keys_for_eval,
   [1544](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1544)     )

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1708](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1708), in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   [1706](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1706) if resume_from_checkpoint is not None:
   [1707](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1707)     if self.is_deepspeed_enabled:
-> [1708](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1708)         deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint)
   [1709](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1709)     elif is_sagemaker_mp_enabled() or self.is_fsdp_enabled:
   [1710](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/trainer.py:1710)         self._load_from_checkpoint(resume_from_checkpoint, self.model_wrapped)

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:402](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:402), in deepspeed_load_checkpoint(deepspeed_engine, checkpoint_path)
    [400](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:400) logger.info(f"Attempting to resume from {checkpoint_path}")
    [401](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:401) # this magically updates self.optimizer and self.lr_scheduler
--> [402](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:402) load_path, _ = deepspeed_engine.load_checkpoint(
    [403](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:403)     checkpoint_path, load_optimizer_states=True, load_lr_scheduler_states=True
    [404](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:404) )
    [405](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:405) if load_path is None:
    [406](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/transformers/integrations/deepspeed.py:406)     raise ValueError(f"[deepspeed] failed to resume from checkpoint {checkpoint_path}")

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2740](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2740), in DeepSpeedEngine.load_checkpoint(self, load_dir, tag, load_module_strict, load_optimizer_states, load_lr_scheduler_states, load_module_only, custom_load_fn)
   [2736](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2736) if self._optimizer_has_ckpt_event_prologue():
   [2737](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2737)     # Prepare for checkpoint load by ensuring all parameters are partitioned
   [2738](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2738)     self.optimizer.checkpoint_event_prologue()
-> [2740](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2740) load_path, client_states = self._load_checkpoint(load_dir,
   [2741](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2741)                                                  tag,
   [2742](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2742)                                                  load_module_strict=load_module_strict,
   [2743](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2743)                                                  load_optimizer_states=load_optimizer_states,
   [2744](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2744)                                                  load_lr_scheduler_states=load_lr_scheduler_states,
   [2745](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2745)                                                  load_module_only=load_module_only,
   [2746](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2746)                                                  custom_load_fn=custom_load_fn)
   [2748](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2748) load_zero_checkpoint = load_path is not None and (self.zero_optimization() or self.bfloat16_enabled())
   [2749](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2749) if load_zero_checkpoint:

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2825](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2825), in DeepSpeedEngine._load_checkpoint(self, load_dir, tag, load_module_strict, load_optimizer_states, load_lr_scheduler_states, load_module_only, custom_load_fn)
   [2816](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2816)     DeepSpeedEngine.load_moe_state_dict(load_dir,
   [2817](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2817)                                         tag,
   [2818](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2818)                                         state_dict=checkpoint['module'],
   (...)
   [2822](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2822)                                         num_experts=self.num_experts,
   [2823](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2823)                                         checkpoint_engine=self.checkpoint_engine)
   [2824](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2824) if not self.load_universal_checkpoint():
-> [2825](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2825)     self.load_module_state_dict(checkpoint=checkpoint,
   [2826](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2826)                                 strict=load_module_strict,
   [2827](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2827)                                 custom_load_fn=custom_load_fn,
   [2828](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2828)                                 fetch_z3_params=fetch_z3_params)
   [2830](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2830) self.loaded_checkpoint_dp_world_size = checkpoint['dp_world_size']
   [2832](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2832) optim_checkpoint = None

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2603](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2603), in DeepSpeedEngine.load_module_state_dict(self, checkpoint, strict, custom_load_fn, fetch_z3_params)
   [2601](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2601)         custom_load_fn(src=module_state_dict, dst=self.module)
   [2602](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2602)     else:
-> [2603](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2603)         self.module.load_state_dict(
   [2604](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2604)             module_state_dict,  # TODO
   [2605](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2605)             strict=strict)
   [2607](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2607) if checkpoint.get(FROZEN_PARAM_FRAGMENTS, None) is not None:
   [2608](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/deepspeed/runtime/engine.py:2608)     saved_frozen_params = checkpoint[FROZEN_PARAM_FRAGMENTS]

File [/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152), in Module.load_state_dict(self, state_dict, strict, assign)
   [2147](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2147)         error_msgs.insert(
   [2148](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2148)             0, 'Missing key(s) in state_dict: {}. '.format(
   [2149](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2149)                 ', '.join(f'"{k}"' for k in missing_keys)))
   [2151](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2151) if len(error_msgs) > 0:
-> [2152](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2152)     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
   [2153](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2153)                        self.__class__.__name__, "\n\t".join(error_msgs)))
   [2154](https://vscode-remote+ssh-002dremote-002baiops-002dllms.vscode-resource.vscode-cdn.net/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py:2154) return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
	Missing key(s) in state_dict: "base_model.model.model.embed_tokens.weight", "base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.k_proj.weight", "base_model.model.model.layers.0.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.o_proj.weight", "base_model.model.model.layers.0.mlp.gate_proj.weight", "base_model.model.model.layers.0.mlp.up_proj.weight", "base_model.model.model.layers.0.mlp.down_proj.weight", "base_model.model.model.layers.0.input_layernorm.weight", "base_model.model.model.layers.0.post_attention_layernorm.weight", "base_model.model.model.layers.1.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.k_proj.weight", "base_model.model.model.layers.1.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.o_proj.weight", "base_model.model.model.layers.1.mlp.gate_proj.weight", "base_model.model.model.layers.1.mlp.up_proj.weight", "base_model.model.model.layers.1.mlp.down_proj.weight", "base_model.model.model.layers.1.input_layernorm.weight", "base_model.model.model.layers.1.post_attention_layernorm.weight", "base_model.model.model.layers.2.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.k_proj.weight", "base_model.model.model.layers.2.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.o_proj.weight", "base_model.model.model.layers.2.mlp.gate_proj.weight", "base_model.model.model.layers.2.mlp.up_proj.weight", "base_model.model.model.layers.2.mlp.down_proj.weight", "base_model.model.model.layers.2.input_layernorm.weight", "base_model.model.model.layers.2.post_attention_layernorm.weight", "base_model.model.model.layers.3.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.k_proj.weight", "base_model.model.model.layers.3.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.o_proj.weight", "base_model.model.model.layers.3.mlp.gate_proj.weight", "base_model.model.model.layers.3.mlp.up_proj.weight", "base_model.model.model.layers.3.mlp.down_proj.weight", "base_model.model.model.layers.3.input_layernorm.weight", "base_model.model.model.layers.3.post_attention_layernorm.weight", "base_model.model.model.layers.4.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.k_proj.weight", "base_model.model.model.layers.4.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.o_proj.weight", "base_model.model.model.layers.4.mlp.gate_proj.weight", "base_model.model.model.layers.4.mlp.up_proj.weight", "base_model.model.model.layers.4.mlp.down_proj.weight", "base_model.model.model.layers.4.input_layernorm.weight", "base_model.model.model.layers.4.post_attention_layernorm.weight", "base_model.model.model.layers.5.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.k_proj.weight", "base_model.model.model.layers.5.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.o_proj.weight", "base_model.model.model.layers.5.mlp.gate_proj.weight", "base_model.model.model.layers.5.mlp.up_proj.weight", "base_model.model.model.layers.5.mlp.down_proj.weight", "base_model.model.model.layers.5.input_layernorm.weight", "base_model.model.model.layers.5.post_attention_layernorm.weight", "base_model.model.model.layers.6.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.k_proj.weight", "base_model.model.model.layers.6.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.o_proj.weight", "base_model.model.model.layers.6.mlp.gate_proj.weight", "base_model.model.model.layers.6.mlp.up_proj.weight", "base_model.model.model.layers.6.mlp.down_proj.weight", "base_model.model.model.layers.6.input_layernorm.weight", "base_model.model.model.layers.6.post_attention_layernorm.weight", "base_model.model.model.layers.7.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.k_proj.weight", "base_model.model.model.layers.7.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.o_proj.weight", "base_model.model.model.layers.7.mlp.gate_proj.weight", "base_model.model.model.layers.7.mlp.up_proj.weight", "base_model.model.model.layers.7.mlp.down_proj.weight", "base_model.model.model.layers.7.input_layernorm.weight", "base_model.model.model.layers.7.post_attention_layernorm.weight", "base_model.model.model.layers.8.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.k_proj.weight", "base_model.model.model.layers.8.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.o_proj.weight", "base_model.model.model.layers.8.mlp.gate_proj.weight", "base_model.model.model.layers.8.mlp.up_proj.weight", "base_model.model.model.layers.8.mlp.down_proj.weight", "base_model.model.model.layers.8.input_layernorm.weight", "base_model.model.model.layers.8.post_attention_layernorm.weight", "base_model.model.model.layers.9.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.k_proj.weight", "base_model.model.model.layers.9.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.o_proj.weight", "base_model.model.model.layers.9.mlp.gate_proj.weight", "base_model.model.model.layers.9.mlp.up_proj.weight", "base_model.model.model.layers.9.mlp.down_proj.weight", "base_model.model.model.layers.9.input_layernorm.weight", "base_model.model.model.layers.9.post_attention_layernorm.weight", "base_model.model.model.layers.10.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.k_proj.weight", "base_model.model.model.layers.10.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.o_proj.weight", "base_model.model.model.layers.10.mlp.gate_proj.weight", "base_model.model.model.layers.10.mlp.up_proj.weight", "base_model.model.model.layers.10.mlp.down_proj.weight", "base_model.model.model.layers.10.input_layernorm.weight", "base_model.model.model.layers.10.post_attention_layernorm.weight", "base_model.model.model.layers.11.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.k_proj.weight", "base_model.model.model.layers.11.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.o_proj.weight", "base_model.model.model.layers.11.mlp.gate_proj.weight", "base_model.model.model.layers.11.mlp.up_proj.weight", "base_model.model.model.layers.11.mlp.down_proj.weight", "base_model.model.model.layers.11.input_layernorm.weight", "base_model.model.model.layers.11.post_attention_layernorm.weight", "base_model.model.model.layers.12.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.k_proj.weight", "base_model.model.model.layers.12.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.o_proj.weight", "base_model.model.model.layers.12.mlp.gate_proj.weight", "base_model.model.model.layers.12.mlp.up_proj.weight", "base_model.model.model.layers.12.mlp.down_proj.weight", "base_model.model.model.layers.12.input_layernorm.weight", "base_model.model.model.layers.12.post_attention_layernorm.weight", "base_model.model.model.layers.13.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.k_proj.weight", "base_model.model.model.layers.13.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.o_proj.weight", "base_model.model.model.layers.13.mlp.gate_proj.weight", "base_model.model.model.layers.13.mlp.up_proj.weight", "base_model.model.model.layers.13.mlp.down_proj.weight", "base_model.model.model.layers.13.input_layernorm.weight", "base_model.model.model.layers.13.post_attention_layernorm.weight", "base_model.model.model.layers.14.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.k_proj.weight", "base_model.model.model.layers.14.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.o_proj.weight", "base_model.model.model.layers.14.mlp.gate_proj.weight", "base_model.model.model.layers.14.mlp.up_proj.weight", "base_model.model.model.layers.14.mlp.down_proj.weight", "base_model.model.model.layers.14.input_layernorm.weight", "base_model.model.model.layers.14.post_attention_layernorm.weight", "base_model.model.model.layers.15.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.k_proj.weight", "base_model.model.model.layers.15.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.o_proj.weight", "base_model.model.model.layers.15.mlp.gate_proj.weight", "base_model.model.model.layers.15.mlp.up_proj.weight", "base_model.model.model.layers.15.mlp.down_proj.weight", "base_model.model.model.layers.15.input_layernorm.weight", "base_model.model.model.layers.15.post_attention_layernorm.weight", "base_model.model.model.layers.16.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.k_proj.weight", "base_model.model.model.layers.16.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.o_proj.weight", "base_model.model.model.layers.16.mlp.gate_proj.weight", "base_model.model.model.layers.16.mlp.up_proj.weight", "base_model.model.model.layers.16.mlp.down_proj.weight", "base_model.model.model.layers.16.input_layernorm.weight", "base_model.model.model.layers.16.post_attention_layernorm.weight", "base_model.model.model.layers.17.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.k_proj.weight", "base_model.model.model.layers.17.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.o_proj.weight", "base_model.model.model.layers.17.mlp.gate_proj.weight", "base_model.model.model.layers.17.mlp.up_proj.weight", "base_model.model.model.layers.17.mlp.down_proj.weight", "base_model.model.model.layers.17.input_layernorm.weight", "base_model.model.model.layers.17.post_attention_layernorm.weight", "base_model.model.model.layers.18.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.k_proj.weight", "base_model.model.model.layers.18.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.o_proj.weight", "base_model.model.model.layers.18.mlp.gate_proj.weight", "base_model.model.model.layers.18.mlp.up_proj.weight", "base_model.model.model.layers.18.mlp.down_proj.weight", "base_model.model.model.layers.18.input_layernorm.weight", "base_model.model.model.layers.18.post_attention_layernorm.weight", "base_model.model.model.layers.19.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.k_proj.weight", "base_model.model.model.layers.19.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.o_proj.weight", "base_model.model.model.layers.19.mlp.gate_proj.weight", "base_model.model.model.layers.19.mlp.up_proj.weight", "base_model.model.model.layers.19.mlp.down_proj.weight", "base_model.model.model.layers.19.input_layernorm.weight", "base_model.model.model.layers.19.post_attention_layernorm.weight", "base_model.model.model.layers.20.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.k_proj.weight", "base_model.model.model.layers.20.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.o_proj.weight", "base_model.model.model.layers.20.mlp.gate_proj.weight", "base_model.model.model.layers.20.mlp.up_proj.weight", "base_model.model.model.layers.20.mlp.down_proj.weight", "base_model.model.model.layers.20.input_layernorm.weight", "base_model.model.model.layers.20.post_attention_layernorm.weight", "base_model.model.model.layers.21.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.k_proj.weight", "base_model.model.model.layers.21.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.o_proj.weight", "base_model.model.model.layers.21.mlp.gate_proj.weight", "base_model.model.model.layers.21.mlp.up_proj.weight", "base_model.model.model.layers.21.mlp.down_proj.weight", "base_model.model.model.layers.21.input_layernorm.weight", "base_model.model.model.layers.21.post_attention_layernorm.weight", "base_model.model.model.layers.22.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.k_proj.weight", "base_model.model.model.layers.22.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.o_proj.weight", "base_model.model.model.layers.22.mlp.gate_proj.weight", "base_model.model.model.layers.22.mlp.up_proj.weight", "base_model.model.model.layers.22.mlp.down_proj.weight", "base_model.model.model.layers.22.input_layernorm.weight", "base_model.model.model.layers.22.post_attention_layernorm.weight", "base_model.model.model.layers.23.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.k_proj.weight", "base_model.model.model.layers.23.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.o_proj.weight", "base_model.model.model.layers.23.mlp.gate_proj.weight", "base_model.model.model.layers.23.mlp.up_proj.weight", "base_model.model.model.layers.23.mlp.down_proj.weight", "base_model.model.model.layers.23.input_layernorm.weight", "base_model.model.model.layers.23.post_attention_layernorm.weight", "base_model.model.model.layers.24.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.k_proj.weight", "base_model.model.model.layers.24.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.o_proj.weight", "base_model.model.model.layers.24.mlp.gate_proj.weight", "base_model.model.model.layers.24.mlp.up_proj.weight", "base_model.model.model.layers.24.mlp.down_proj.weight", "base_model.model.model.layers.24.input_layernorm.weight", "base_model.model.model.layers.24.post_attention_layernorm.weight", "base_model.model.model.layers.25.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.k_proj.weight", "base_model.model.model.layers.25.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.o_proj.weight", "base_model.model.model.layers.25.mlp.gate_proj.weight", "base_model.model.model.layers.25.mlp.up_proj.weight", "base_model.model.model.layers.25.mlp.down_proj.weight", "base_model.model.model.layers.25.input_layernorm.weight", "base_model.model.model.layers.25.post_attention_layernorm.weight", "base_model.model.model.layers.26.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.k_proj.weight", "base_model.model.model.layers.26.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.o_proj.weight", "base_model.model.model.layers.26.mlp.gate_proj.weight", "base_model.model.model.layers.26.mlp.up_proj.weight", "base_model.model.model.layers.26.mlp.down_proj.weight", "base_model.model.model.layers.26.input_layernorm.weight", "base_model.model.model.layers.26.post_attention_layernorm.weight", "base_model.model.model.layers.27.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.k_proj.weight", "base_model.model.model.layers.27.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.o_proj.weight", "base_model.model.model.layers.27.mlp.gate_proj.weight", "base_model.model.model.layers.27.mlp.up_proj.weight", "base_model.model.model.layers.27.mlp.down_proj.weight", "base_model.model.model.layers.27.input_layernorm.weight", "base_model.model.model.layers.27.post_attention_layernorm.weight", "base_model.model.model.layers.28.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.k_proj.weight", "base_model.model.model.layers.28.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.o_proj.weight", "base_model.model.model.layers.28.mlp.gate_proj.weight", "base_model.model.model.layers.28.mlp.up_proj.weight", "base_model.model.model.layers.28.mlp.down_proj.weight", "base_model.model.model.layers.28.input_layernorm.weight", "base_model.model.model.layers.28.post_attention_layernorm.weight", "base_model.model.model.layers.29.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.k_proj.weight", "base_model.model.model.layers.29.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.o_proj.weight", "base_model.model.model.layers.29.mlp.gate_proj.weight", "base_model.model.model.layers.29.mlp.up_proj.weight", "base_model.model.model.layers.29.mlp.down_proj.weight", "base_model.model.model.layers.29.input_layernorm.weight", "base_model.model.model.layers.29.post_attention_layernorm.weight", "base_model.model.model.layers.30.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.k_proj.weight", "base_model.model.model.layers.30.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.o_proj.weight", "base_model.model.model.layers.30.mlp.gate_proj.weight", "base_model.model.model.layers.30.mlp.up_proj.weight", "base_model.model.model.layers.30.mlp.down_proj.weight", "base_model.model.model.layers.30.input_layernorm.weight", "base_model.model.model.layers.30.post_attention_layernorm.weight", "base_model.model.model.layers.31.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.k_proj.weight", "base_model.model.model.layers.31.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.o_proj.weight", "base_model.model.model.layers.31.mlp.gate_proj.weight", "base_model.model.model.layers.31.mlp.up_proj.weight", "base_model.model.model.layers.31.mlp.down_proj.weight", "base_model.model.model.layers.31.input_layernorm.weight", "base_model.model.model.layers.31.post_attention_layernorm.weight", "base_model.model.model.norm.weight", "base_model.model.lm_head.weight".

I also tried converting the weights by the script zero_to_fp32.py but still got the same error

Expected behavior

Training should be resumed without error

seanbenhur avatar Feb 11 '24 07:02 seanbenhur

I also am having this problem, but it was caused by trainer.train()

tungsontran avatar Feb 13 '24 01:02 tungsontran

Hi everyone, not sure if this is related but this might be fixed on peft main: https://github.com/huggingface/transformers/issues/28770#issuecomment-1935819776 see this comment from @pacman100

younesbelkada avatar Feb 13 '24 01:02 younesbelkada

This problem still persists even after updating to the latest version

Hi everyone, not sure if this is related but this might be fixed on peft main: huggingface/transformers#28770 (comment) see this comment from @pacman100

tungsontran avatar Feb 13 '24 13:02 tungsontran

The problem seems to arise when using Peft+Deepspeed, even when training on only 1 GPU. My code would have no problem when running without distributed training.

home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
Traceback (most recent call last):
  File "/home/tung/development/llm-data-generator/llm_finetune.py", line 139, in <module>
    trainer.train()
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/trainer.py", line 1972, in _inner_training_loop
    self._load_best_model()
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/trainer.py", line 2168, in _load_best_model
    deepspeed_load_checkpoint(self.model_wrapped, self.state.best_model_checkpoint)
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint
    load_path, _ = deepspeed_engine.load_checkpoint(
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2740, in load_checkpoint
    load_path, client_states = self._load_checkpoint(load_dir,
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2825, in _load_checkpoint
    self.load_module_state_dict(checkpoint=checkpoint,
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 2603, in load_module_state_dict
    self.module.load_state_dict(
  File "/home/tung/anaconda3/envs/gpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        Missing key(s) in state_dict: "base_model.model.model.embed_tokens.weight", "base_model.model.model.layers.0.input_layernorm.weight", "base_model.model.model.layers.0.post_attention_layernorm.weight", "base_model.model.model.layers.1.input_layernorm.weight", "base_model.model.model.layers.1.post_attention_layernorm.weight", "base_model.model.model.layers.2.input_layernorm.weight", "base_model.model.model.layers.2.post_attention_layernorm.weight", "base_model.model.model.layers.3.input_layernorm.weight", "base_model.model.model.layers.3.post_attention_layernorm.weight", "base_model.model.model.layers.4.input_layernorm.weight", "base_model.model.model.layers.4.post_attention_layernorm.weight", "base_model.model.model.layers.5.input_layernorm.weight", "base_model.model.model.layers.5.post_attention_layernorm.weight", "base_model.model.model.layers.6.input_layernorm.weight", "base_model.model.model.layers.6.post_attention_layernorm.weight", "base_model.model.model.layers.7.input_layernorm.weight", "base_model.model.model.layers.7.post_attention_layernorm.weight", "base_model.model.model.layers.8.input_layernorm.weight", "base_model.model.model.layers.8.post_attention_layernorm.weight", "base_model.model.model.layers.9.input_layernorm.weight", "base_model.model.model.layers.9.post_attention_layernorm.weight", "base_model.model.model.layers.10.input_layernorm.weight", "base_model.model.model.layers.10.post_attention_layernorm.weight", "base_model.model.model.layers.11.input_layernorm.weight", "base_model.model.model.layers.11.post_attention_layernorm.weight", "base_model.model.model.layers.12.input_layernorm.weight", "base_model.model.model.layers.12.post_attention_layernorm.weight", "base_model.model.model.layers.13.input_layernorm.weight", "base_model.model.model.layers.13.post_attention_layernorm.weight", "base_model.model.model.layers.14.input_layernorm.weight", "base_model.model.model.layers.14.post_attention_layernorm.weight", "base_model.model.model.layers.15.input_layernorm.weight", "base_model.model.model.layers.15.post_attention_layernorm.weight", "base_model.model.model.layers.16.input_layernorm.weight", "base_model.model.model.layers.16.post_attention_layernorm.weight", "base_model.model.model.layers.17.input_layernorm.weight", "base_model.model.model.layers.17.post_attention_layernorm.weight", "base_model.model.model.layers.18.input_layernorm.weight", "base_model.model.model.layers.18.post_attention_layernorm.weight", "base_model.model.model.layers.19.input_layernorm.weight", "base_model.model.model.layers.19.post_attention_layernorm.weight", "base_model.model.model.layers.20.input_layernorm.weight", "base_model.model.model.layers.20.post_attention_layernorm.weight", "base_model.model.model.layers.21.input_layernorm.weight", "base_model.model.model.layers.21.post_attention_layernorm.weight", "base_model.model.model.layers.22.input_layernorm.weight", "base_model.model.model.layers.22.post_attention_layernorm.weight", "base_model.model.model.layers.23.input_layernorm.weight", "base_model.model.model.layers.23.post_attention_layernorm.weight", "base_model.model.model.layers.24.input_layernorm.weight", "base_model.model.model.layers.24.post_attention_layernorm.weight", "base_model.model.model.layers.25.input_layernorm.weight", "base_model.model.model.layers.25.post_attention_layernorm.weight", "base_model.model.model.layers.26.input_layernorm.weight", "base_model.model.model.layers.26.post_attention_layernorm.weight", "base_model.model.model.layers.27.input_layernorm.weight", "base_model.model.model.layers.27.post_attention_layernorm.weight", "base_model.model.model.layers.28.input_layernorm.weight", "base_model.model.model.layers.28.post_attention_layernorm.weight", "base_model.model.model.layers.29.input_layernorm.weight", "base_model.model.model.layers.29.post_attention_layernorm.weight", "base_model.model.model.layers.30.input_layernorm.weight", "base_model.model.model.layers.30.post_attention_layernorm.weight", "base_model.model.model.layers.31.input_layernorm.weight", "base_model.model.model.layers.31.post_attention_layernorm.weight",
"base_model.model.model.norm.weight", 
"base_model.model.lm_head.weight". 

tungsontran avatar Feb 14 '24 11:02 tungsontran

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Mar 12 '24 15:03 github-actions[bot]

Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3

iarbel84 avatar Mar 15 '24 09:03 iarbel84

Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3

Do you use DS zero 3 with 4 bits quantization? AFAIK they dont work together. I was able to fix this issue with go down to DS zero 2 and update transformers to version >= 4.38.2

tungsontran avatar Mar 15 '24 23:03 tungsontran

Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3

Do you use DS zero 3 with 4 bits quantization? AFAIK they dont work together. I was able to fix this issue with go down to DS zero 2 and update transformers to version >= 4.38.2

DS-zero-3 but without any quantization

iarbel84 avatar Mar 16 '24 12:03 iarbel84

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Apr 09 '24 15:04 github-actions[bot]

Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3

Do you use DS zero 3 with 4 bits quantization? AFAIK they dont work together. I was able to fix this issue with go down to DS zero 2 and update transformers to version >= 4.38.2

Me too, If only stage 2 optimization is used instead of up to three, I can load my model without missing key. The model can not be loaded due to missing keys if I want to utilize stage 3 optimization. However, stage 3 seems to provide much boost in speed. Hope this bug can be solved soon!

MagicianWu avatar Apr 10 '24 05:04 MagicianWu

i solved this problem under deepspeed zero3 by changing the environment:

transformers==4.38.2, pydantic==1.9.0, accelerate==0.27.2

ktlKTL avatar Apr 12 '24 08:04 ktlKTL

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar May 06 '24 15:05 github-actions[bot]

@ktlKTL using your package version, got new error: ValueError: Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([0])), this look incorrect.

Any hints?

Andcircle avatar Jul 15 '24 20:07 Andcircle

i solved this problem under deepspeed zero3 by changing the environment:

transformers==4.38.2, pydantic==1.9.0, accelerate==0.27.2

Great! It solves my problem perfectly! Thanks for providing the solution.

JunJieYa avatar Aug 03 '24 03:08 JunJieYa