axolotl
axolotl copied to clipboard
Getting deepspeed error on training completion and failing to save. if self.deepspeed_config["zero_optimization"]["stage"] == 3: AttributeError: 'Accelerator' object has no attribute 'deepspeed_config'
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
Running on Windows 10 WSL2 Ubuntu. On 2x RTX 3090 24GB with NVLink and Deepspeed Zero2.
Expected behavior is to complete the training and save the checkpoint normally like the training run would save in between epochs. So it would save in between the epochs or at the end of the epochs not at the end of the training run but it would fail at the end of the training run.
Current behaviour
At the end if a run if it tries to save at the end of the training run it gets interrupted by this error and fails to save only if wandb_log_model: checkpoint
This error below is when wandb_log_model: end
when it manages to save the last checkpoint but still shows the same error that seems to relate to deepspeed.
100%|██████████████████████████████████████████████████████████████████████████████| 773/773 [11:41:18<00:00, 50.90s/it]/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/modules/module.py:1879: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
{'train_runtime': 42088.2092, 'train_samples_per_second': 8.1, 'train_steps_per_second': 0.018, 'train_loss': 0.7072060618641152, 'epoch': 1.0}
100%|██████████████████████████████████████████████████████████████████████████████| 773/773 [11:41:20<00:00, 50.90s/it]Traceback (most recent call last):
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/owen/axolotl/src/axolotl/cli/train.py", line 42, in <module>
fire.Fire(do_cli)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/owen/axolotl/src/axolotl/cli/train.py", line 38, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/owen/axolotl/src/axolotl/train.py", line 142, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1543, in train
return inner_training_loop(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1996, in _inner_training_loop
self.control = self.callback_handler.on_train_end(args, self.state, self.control)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer_callback.py", line 373, in on_train_end
return self.call_event("on_train_end", args, state, control)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer_callback.py", line 414, in call_event
result = getattr(callback, event)(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/integrations/integration_utils.py", line 777, in on_train_end
fake_trainer.save_model(temp_dir)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2836, in save_model
state_dict = self.accelerator.get_state_dict(self.deepspeed)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/accelerator.py", line 3085, in get_state_dict
if self.deepspeed_config["zero_optimization"]["stage"] == 3:
AttributeError: 'Accelerator' object has no attribute 'deepspeed_config'
wandb:
wandb: Run history:
wandb: eval/loss █▂▁▁
wandb: eval/runtime ▃█▁▁
wandb: eval/samples_per_second ▆▁██
wandb: eval/steps_per_second ▆▁██
wandb: train/epoch ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
wandb: train/global_step ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
wandb: train/learning_rate ▇██████▇▇▇▇▇▇▆▆▆▆▅▅▅▄▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁
wandb: train/loss █▆▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▂▁▁▁
wandb: train/total_flos ▁
wandb: train/train_loss ▁
wandb: train/train_runtime ▁
wandb: train/train_samples_per_second ▁
wandb: train/train_steps_per_second ▁
wandb:
wandb: Run summary:
wandb: eval/loss 0.6376
wandb: eval/runtime 659.3455
wandb: eval/samples_per_second 27.213
wandb: eval/steps_per_second 3.402
wandb: train/epoch 1.0
wandb: train/global_step 773
wandb: train/learning_rate 0.0
wandb: train/loss 0.6224
wandb: train/total_flos 4.3481528198454313e+18
wandb: train/train_loss 0.70721
wandb: train/train_runtime 42088.2092
wandb: train/train_samples_per_second 8.1
wandb: train/train_steps_per_second 0.018
wandb:
wandb: ***
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 1 other file(s)
wandb: Find logs at: ***
[2024-01-11 11:23:03,757] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 79792) of binary: /home/owen/miniconda3/envs/axolotl/bin/python
Traceback (most recent call last):
File "/home/owen/miniconda3/envs/axolotl/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/launch.py", line 979, in launch_command
deepspeed_launcher(args)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/launch.py", line 695, in deepspeed_launcher
distrib_run.run(args)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
axolotl.cli.train FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-01-11_11:23:03
host : COMPUTE-PC.
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 79792)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Trying to continue the training from the last checkpoint also fails with a different error.
Loading extension module fused_adam...
Time to load fused_adam op: 0.06896162033081055 seconds
/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py:96: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
self._dummy_overflow_buf = get_accelerator().IntTensor([0])
Loading extension module fused_adam...
Time to load fused_adam op: 0.10147833824157715 seconds
/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py:96: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:83.)
self._dummy_overflow_buf = get_accelerator().IntTensor([0])
Traceback (most recent call last):
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/owen/axolotl/src/axolotl/cli/train.py", line 42, in <module>
fire.Fire(do_cli)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/owen/axolotl/src/axolotl/cli/train.py", line 38, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/owen/axolotl/src/axolotl/train.py", line 142, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1543, in train
return inner_training_loop(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1699, in _inner_training_loop
deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint
load_path, _ = deepspeed_engine.load_checkpoint(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2724, in load_checkpoint
load_path, client_states = self._load_checkpoint(load_dir,
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2794, in _load_checkpoint
self.load_module_state_dict(checkpoint=checkpoint,
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2587, in load_module_state_dict
self.module.load_state_dict(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
Missing key(s) in state_dict: "base_model.model.model.embed_tokens.weight", "base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.0.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.0.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.0.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.0.input_layernorm.weight", "base_model.model.model.layers.0.post_attention_layernorm.weight", "base_model.model.model.layers.1.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.1.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.1.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.1.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.1.input_layernorm.weight", "base_model.model.model.layers.1.post_attention_layernorm.weight", "base_model.model.model.layers.2.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.2.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.2.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.2.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.2.input_layernorm.weight", "base_model.model.model.layers.2.post_attention_layernorm.weight", "base_model.model.model.layers.3.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.3.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.3.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.3.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.3.input_layernorm.weight", "base_model.model.model.layers.3.post_attention_layernorm.weight", "base_model.model.model.layers.4.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.4.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.4.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.4.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.4.input_layernorm.weight", "base_model.model.model.layers.4.post_attention_layernorm.weight", "base_model.model.model.layers.5.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.5.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.5.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.5.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.5.input_layernorm.weight", "base_model.model.model.layers.5.post_attention_layernorm.weight", "base_model.model.model.layers.6.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.6.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.6.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.6.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.6.input_layernorm.weight", "base_model.model.model.layers.6.post_attention_layernorm.weight", "base_model.model.model.layers.7.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.7.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.7.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.7.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.7.input_layernorm.weight", "base_model.model.model.layers.7.post_attention_layernorm.weight", "base_model.model.model.layers.8.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.8.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.8.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.8.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.8.input_layernorm.weight", "base_model.model.model.layers.8.post_attention_layernorm.weight", "base_model.model.model.layers.9.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.9.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.9.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.9.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.9.input_layernorm.weight", "base_model.model.model.layers.9.post_attention_layernorm.weight", "base_model.model.model.layers.10.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.10.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.10.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.10.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.10.input_layernorm.weight", "base_model.model.model.layers.10.post_attention_layernorm.weight", "base_model.model.model.layers.11.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.11.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.11.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.11.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.11.input_layernorm.weight", "base_model.model.model.layers.11.post_attention_layernorm.weight", "base_model.model.model.layers.12.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.12.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.12.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.12.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.12.input_layernorm.weight", "base_model.model.model.layers.12.post_attention_layernorm.weight", "base_model.model.model.layers.13.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.13.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.13.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.13.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.13.input_layernorm.weight", "base_model.model.model.layers.13.post_attention_layernorm.weight", "base_model.model.model.layers.14.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.14.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.14.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.14.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.14.input_layernorm.weight", "base_model.model.model.layers.14.post_attention_layernorm.weight", "base_model.model.model.layers.15.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.15.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.15.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.15.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.15.input_layernorm.weight", "base_model.model.model.layers.15.post_attention_layernorm.weight", "base_model.model.model.layers.16.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.16.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.16.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.16.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.16.input_layernorm.weight", "base_model.model.model.layers.16.post_attention_layernorm.weight", "base_model.model.model.layers.17.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.17.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.17.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.17.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.17.input_layernorm.weight", "base_model.model.model.layers.17.post_attention_layernorm.weight", "base_model.model.model.layers.18.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.18.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.18.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.18.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.18.input_layernorm.weight", "base_model.model.model.layers.18.post_attention_layernorm.weight", "base_model.model.model.layers.19.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.19.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.19.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.19.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.19.input_layernorm.weight", "base_model.model.model.layers.19.post_attention_layernorm.weight", "base_model.model.model.layers.20.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.20.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.20.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.20.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.20.input_layernorm.weight", "base_model.model.model.layers.20.post_attention_layernorm.weight", "base_model.model.model.layers.21.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.21.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.21.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.21.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.21.input_layernorm.weight", "base_model.model.model.layers.21.post_attention_layernorm.weight", "base_model.model.model.layers.22.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.22.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.22.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.22.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.22.input_layernorm.weight", "base_model.model.model.layers.22.post_attention_layernorm.weight", "base_model.model.model.layers.23.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.23.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.23.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.23.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.23.input_layernorm.weight", "base_model.model.model.layers.23.post_attention_layernorm.weight", "base_model.model.model.layers.24.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.24.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.24.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.24.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.24.input_layernorm.weight", "base_model.model.model.layers.24.post_attention_layernorm.weight", "base_model.model.model.layers.25.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.25.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.25.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.25.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.25.input_layernorm.weight", "base_model.model.model.layers.25.post_attention_layernorm.weight", "base_model.model.model.layers.26.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.26.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.26.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.26.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.26.input_layernorm.weight", "base_model.model.model.layers.26.post_attention_layernorm.weight", "base_model.model.model.layers.27.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.27.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.27.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.27.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.27.input_layernorm.weight", "base_model.model.model.layers.27.post_attention_layernorm.weight", "base_model.model.model.layers.28.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.28.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.28.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.28.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.28.input_layernorm.weight", "base_model.model.model.layers.28.post_attention_layernorm.weight", "base_model.model.model.layers.29.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.29.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.29.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.29.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.29.input_layernorm.weight", "base_model.model.model.layers.29.post_attention_layernorm.weight", "base_model.model.model.layers.30.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.30.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.30.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.30.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.30.input_layernorm.weight", "base_model.model.model.layers.30.post_attention_layernorm.weight", "base_model.model.model.layers.31.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.31.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.31.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.31.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.31.input_layernorm.weight", "base_model.model.model.layers.31.post_attention_layernorm.weight", "base_model.model.model.norm.weight", "base_model.model.lm_head.weight".
Traceback (most recent call last):
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/owen/axolotl/src/axolotl/cli/train.py", line 42, in <module>
fire.Fire(do_cli)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/owen/axolotl/src/axolotl/cli/train.py", line 38, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/owen/axolotl/src/axolotl/train.py", line 142, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1543, in train
return inner_training_loop(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1699, in _inner_training_loop
deepspeed_load_checkpoint(self.model_wrapped, resume_from_checkpoint)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/transformers/integrations/deepspeed.py", line 402, in deepspeed_load_checkpoint
load_path, _ = deepspeed_engine.load_checkpoint(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2724, in load_checkpoint
load_path, client_states = self._load_checkpoint(load_dir,
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2794, in _load_checkpoint
self.load_module_state_dict(checkpoint=checkpoint,
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2587, in load_module_state_dict
self.module.load_state_dict(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
Missing key(s) in state_dict: "base_model.model.model.embed_tokens.weight", "base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.0.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.0.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.0.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.0.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.0.input_layernorm.weight", "base_model.model.model.layers.0.post_attention_layernorm.weight", "base_model.model.model.layers.1.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.1.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.1.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.1.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.1.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.1.input_layernorm.weight", "base_model.model.model.layers.1.post_attention_layernorm.weight", "base_model.model.model.layers.2.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.2.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.2.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.2.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.2.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.2.input_layernorm.weight", "base_model.model.model.layers.2.post_attention_layernorm.weight", "base_model.model.model.layers.3.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.3.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.3.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.3.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.3.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.3.input_layernorm.weight", "base_model.model.model.layers.3.post_attention_layernorm.weight", "base_model.model.model.layers.4.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.4.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.4.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.4.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.4.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.4.input_layernorm.weight", "base_model.model.model.layers.4.post_attention_layernorm.weight", "base_model.model.model.layers.5.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.5.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.5.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.5.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.5.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.5.input_layernorm.weight", "base_model.model.model.layers.5.post_attention_layernorm.weight", "base_model.model.model.layers.6.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.6.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.6.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.6.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.6.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.6.input_layernorm.weight", "base_model.model.model.layers.6.post_attention_layernorm.weight", "base_model.model.model.layers.7.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.7.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.7.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.7.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.7.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.7.input_layernorm.weight", "base_model.model.model.layers.7.post_attention_layernorm.weight", "base_model.model.model.layers.8.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.8.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.8.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.8.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.8.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.8.input_layernorm.weight", "base_model.model.model.layers.8.post_attention_layernorm.weight", "base_model.model.model.layers.9.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.9.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.9.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.9.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.9.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.9.input_layernorm.weight", "base_model.model.model.layers.9.post_attention_layernorm.weight", "base_model.model.model.layers.10.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.10.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.10.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.10.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.10.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.10.input_layernorm.weight", "base_model.model.model.layers.10.post_attention_layernorm.weight", "base_model.model.model.layers.11.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.11.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.11.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.11.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.11.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.11.input_layernorm.weight", "base_model.model.model.layers.11.post_attention_layernorm.weight", "base_model.model.model.layers.12.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.12.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.12.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.12.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.12.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.12.input_layernorm.weight", "base_model.model.model.layers.12.post_attention_layernorm.weight", "base_model.model.model.layers.13.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.13.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.13.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.13.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.13.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.13.input_layernorm.weight", "base_model.model.model.layers.13.post_attention_layernorm.weight", "base_model.model.model.layers.14.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.14.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.14.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.14.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.14.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.14.input_layernorm.weight", "base_model.model.model.layers.14.post_attention_layernorm.weight", "base_model.model.model.layers.15.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.15.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.15.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.15.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.15.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.15.input_layernorm.weight", "base_model.model.model.layers.15.post_attention_layernorm.weight", "base_model.model.model.layers.16.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.16.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.16.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.16.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.16.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.16.input_layernorm.weight", "base_model.model.model.layers.16.post_attention_layernorm.weight", "base_model.model.model.layers.17.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.17.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.17.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.17.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.17.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.17.input_layernorm.weight", "base_model.model.model.layers.17.post_attention_layernorm.weight", "base_model.model.model.layers.18.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.18.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.18.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.18.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.18.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.18.input_layernorm.weight", "base_model.model.model.layers.18.post_attention_layernorm.weight", "base_model.model.model.layers.19.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.19.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.19.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.19.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.19.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.19.input_layernorm.weight", "base_model.model.model.layers.19.post_attention_layernorm.weight", "base_model.model.model.layers.20.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.20.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.20.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.20.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.20.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.20.input_layernorm.weight", "base_model.model.model.layers.20.post_attention_layernorm.weight", "base_model.model.model.layers.21.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.21.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.21.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.21.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.21.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.21.input_layernorm.weight", "base_model.model.model.layers.21.post_attention_layernorm.weight", "base_model.model.model.layers.22.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.22.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.22.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.22.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.22.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.22.input_layernorm.weight", "base_model.model.model.layers.22.post_attention_layernorm.weight", "base_model.model.model.layers.23.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.23.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.23.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.23.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.23.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.23.input_layernorm.weight", "base_model.model.model.layers.23.post_attention_layernorm.weight", "base_model.model.model.layers.24.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.24.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.24.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.24.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.24.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.24.input_layernorm.weight", "base_model.model.model.layers.24.post_attention_layernorm.weight", "base_model.model.model.layers.25.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.25.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.25.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.25.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.25.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.25.input_layernorm.weight", "base_model.model.model.layers.25.post_attention_layernorm.weight", "base_model.model.model.layers.26.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.26.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.26.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.26.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.26.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.26.input_layernorm.weight", "base_model.model.model.layers.26.post_attention_layernorm.weight", "base_model.model.model.layers.27.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.27.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.27.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.27.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.27.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.27.input_layernorm.weight", "base_model.model.model.layers.27.post_attention_layernorm.weight", "base_model.model.model.layers.28.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.28.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.28.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.28.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.28.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.28.input_layernorm.weight", "base_model.model.model.layers.28.post_attention_layernorm.weight", "base_model.model.model.layers.29.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.29.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.29.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.29.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.29.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.29.input_layernorm.weight", "base_model.model.model.layers.29.post_attention_layernorm.weight", "base_model.model.model.layers.30.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.30.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.30.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.30.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.30.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.30.input_layernorm.weight", "base_model.model.model.layers.30.post_attention_layernorm.weight", "base_model.model.model.layers.31.self_attn.q_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.k_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.v_proj.base_layer.weight", "base_model.model.model.layers.31.self_attn.o_proj.base_layer.weight", "base_model.model.model.layers.31.mlp.gate_proj.base_layer.weight", "base_model.model.model.layers.31.mlp.up_proj.base_layer.weight", "base_model.model.model.layers.31.mlp.down_proj.base_layer.weight", "base_model.model.model.layers.31.input_layernorm.weight", "base_model.model.model.layers.31.post_attention_layernorm.weight", "base_model.model.model.norm.weight", "base_model.model.lm_head.weight".
[2024-01-11 12:23:28,317] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 81560) of binary: /home/owen/miniconda3/envs/axolotl/bin/python
Traceback (most recent call last):
File "/home/owen/miniconda3/envs/axolotl/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/launch.py", line 979, in launch_command
deepspeed_launcher(args)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/commands/launch.py", line 695, in deepspeed_launcher
distrib_run.run(args)
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/owen/miniconda3/envs/axolotl/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
axolotl.cli.train FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-01-11_12:23:28
host : COMPUTE-PC.
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 81561)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-01-11_12:23:28
host : COMPUTE-PC.
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 81560)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Steps to reproduce
I have narrowed it down to the issue of not saving at the end of the run being caused by setting
wandb_log_model: checkpoint
It will let the training run save at the end if I set it to
wandb_log_model: end
However neither options changes if the run can be resumed from checkpoint or not. Both fails.
Also changing the epochs options makes no difference on whether it will fail to save at the end of a run.
Config yaml
base_model: ./mistral-7b-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true
load_in_8bit: false
load_in_4bit: true
strict: false
sequence_len: 4096
bf16: true
fp16: false
tf32: false
flash_attention: true
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
# Data
datasets:
***
warmup_steps: 10
dataset_prepared_path: ./last_run_prepared
save_safetensors: true
# Iterations
num_epochs: 1 #Can be anything and it will fail only at the end.
saves_per_epoch: 1 #Can also be set to anything and will only fail to save checkpoint at the end.
# Evaluation
val_set_size: 0.05
evals_per_epoch: 4
eval_table_size:
eval_table_max_new_tokens: 128
# LoRA
output_dir: ./qlora-out
adapter: qlora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
# Sampling
sample_packing: true
pad_to_sequence_len: true
# Batching
gradient_accumulation_steps: 4
micro_batch_size: 4
gradient_checkpointing: true
# wandb
wandb_mode: # "offline" to save run metadata locally and not sync to the server, "disabled" to turn off wandb
wandb_project: mistral
wandb_entity: # A wandb Team name if using a Team
wandb_watch:
wandb_name: 16r-32a-4096s
wandb_run_id: # Set the ID of your wandb run
wandb_log_model: end # "checkpoint" to log model to wandb Artifacts every `save_steps` or "end" to log only at the end of training
# Optimizer
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002
# Misc
train_on_inputs: false
group_by_length: false
early_stopping_patience:
resume_from_checkpoint: true #Will fail to resume from checkpoint when using this option.
local_rank:
logging_steps: 1
xformers_attention:
debug:
deepspeed: ./zero2.json
weight_decay: 0
fsdp:
fsdp_config:
Possible solution
The error on saving at the end seems to come from wandb causing an error when setting wandb_log_model: checkpoint
. But can be resolved when setting wandb_log_model: end
which is what is shows on the error log. It still fails on trying to resume from checkpoint and I have no clue why.
Which Operating Systems are you using?
- [X] Linux
- [ ] macOS
- [X] Windows
Python Version
3.11
axolotl branch-commit
main/9032e610b1af7565eec1908a298e53ae8e5252e7
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.
Thanks for posting. I wonder if this issue needs to be posted to the upstream library.
It has been reported in the DeepSpeed repo: https://github.com/microsoft/DeepSpeed/issues/4143
It has been reported in the DeepSpeed repo:
By any chance did you run accelerate config or have an existing accelerate configuration yaml that accelerate is picking up?
It has been reported in the DeepSpeed repo:
By any chance did you run accelerate config or have an existing accelerate configuration yaml that accelerate is picking up?
It happens with or without an existing accelerate config in the /home/user/.cache/hugging face/accelerate folder in my testing.
#1134
Related issue
upstream deepspeed issue seems to indicate this is related to wandb integration? I don't think you really should be uploading your models to wandb unless they are tiny. it takes a lot of time for that step and it gets expensive fast to store that in wandb
see also https://github.com/OpenAccess-AI-Collective/axolotl/issues/1156#issuecomment-1909115979
upstream deepspeed issue seems to indicate this is related to wandb integration? I don't think you really should be uploading your models to wandb unless they are tiny. it takes a lot of time for that step and it gets expensive fast to store that in wandb
Yea definitely is problem with wandb integration. I guess I'll try without uploading the model to wandb and see if that fixes it. Maybe there could be a comment in the yaml config in the readme to say not to use wandb model saving.