Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
Training a model with relora works without error
Current behaviour
I trained a model on a dataset using relora and after training finished, got this error
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 133/133 [2:14:49<00:00, 49.16s/it]Traceback (most recent call last):
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/erik/axolotl/src/axolotl/cli/train.py", line 38, in
fire.Fire(do_cli)
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/erik/axolotl/src/axolotl/cli/train.py", line 34, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/erik/axolotl/src/axolotl/train.py", line 124, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer.py", line 1555, in train
return inner_training_loop(
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer.py", line 1998, in _inner_training_loop
self.control = self.callback_handler.on_train_end(args, self.state, self.control)
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer_callback.py", line 366, in on_train_end
return self.call_event("on_train_end", args, state, control)
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer_callback.py", line 407, in call_event
result = getattr(callback, event)(
File "/home/erik/axolotl/src/axolotl/monkeypatch/relora.py", line 178, in on_train_end
merge_and_save(
File "/home/erik/axolotl/src/axolotl/monkeypatch/relora.py", line 337, in merge_and_save
old_dev = target.weight.device
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Linear4bit' object has no attribute 'weight'
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 133/133 [2:14:51<00:00, 60.84s/it]
Traceback (most recent call last):
File "/home/erik/anaconda3/envs/llama2-py39/bin/accelerate", line 8, in
sys.exit(main())
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 994, in launch_command
simple_launcher(args)
File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 636, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/erik/anaconda3/envs/llama2-py39/bin/python', '-m', 'axolotl.cli.train', 'examples/llama-2-nl/relora.yml']' returned non-zero exit status 1.
Steps to reproduce
Run relora on WSL2
Config yaml
base_model: NousResearch/Llama-2-7b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: UnderstandLing/oasst1_nl
type: oasst
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./relora-out
adapter: qlora
lora_model_dir:
sequence_len: 2048
sample_packing: true
pad_to_sequence_len: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
relora_steps: 150
relora_warmup_steps: 10
relora_cpu_offload: false
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 4
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
eval_steps: 0.05
save_steps: 50
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: ""
eos_token: ""
unk_token: ""
Possible solution
No response
Which Operating Systems are you using?
- [ ] Linux
- [ ] macOS
- [X] Windows
Python Version
3.9
axolotl branch-commit
main
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.