axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

AttributeError: 'Linear4bit' object has no attribute 'weight' with relora

Open ErikTromp opened this issue 7 months ago • 7 comments

Please check that this issue hasn't been reported before.

  • [X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Training a model with relora works without error

Current behaviour

I trained a model on a dataset using relora and after training finished, got this error

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 133/133 [2:14:49<00:00, 49.16s/it]Traceback (most recent call last): File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/erik/axolotl/src/axolotl/cli/train.py", line 38, in fire.Fire(do_cli) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/erik/axolotl/src/axolotl/cli/train.py", line 34, in do_cli train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta) File "/home/erik/axolotl/src/axolotl/train.py", line 124, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer.py", line 1555, in train return inner_training_loop( File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer.py", line 1998, in _inner_training_loop self.control = self.callback_handler.on_train_end(args, self.state, self.control) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer_callback.py", line 366, in on_train_end return self.call_event("on_train_end", args, state, control) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer_callback.py", line 407, in call_event result = getattr(callback, event)( File "/home/erik/axolotl/src/axolotl/monkeypatch/relora.py", line 178, in on_train_end merge_and_save( File "/home/erik/axolotl/src/axolotl/monkeypatch/relora.py", line 337, in merge_and_save old_dev = target.weight.device File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'Linear4bit' object has no attribute 'weight' 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 133/133 [2:14:51<00:00, 60.84s/it] Traceback (most recent call last): File "/home/erik/anaconda3/envs/llama2-py39/bin/accelerate", line 8, in sys.exit(main()) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 994, in launch_command simple_launcher(args) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 636, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/erik/anaconda3/envs/llama2-py39/bin/python', '-m', 'axolotl.cli.train', 'examples/llama-2-nl/relora.yml']' returned non-zero exit status 1.

Steps to reproduce

Run relora on WSL2

Config yaml

base_model: NousResearch/Llama-2-7b-hf model_type: LlamaForCausalLM tokenizer_type: LlamaTokenizer is_llama_derived_model: true

load_in_8bit: false load_in_4bit: true strict: false

datasets:

  • path: UnderstandLing/oasst1_nl type: oasst dataset_prepared_path: val_set_size: 0.05 output_dir: ./relora-out

adapter: qlora lora_model_dir:

sequence_len: 2048 sample_packing: true pad_to_sequence_len: true

lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: lora_target_linear: true lora_fan_in_fan_out:

relora_steps: 150 relora_warmup_steps: 10 relora_cpu_offload: false

wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model:

gradient_accumulation_steps: 4 micro_batch_size: 4 num_epochs: 1 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.0002

train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false

gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true

warmup_steps: 10 eval_steps: 0.05 save_steps: 50 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: "" eos_token: "" unk_token: ""

Possible solution

No response

Which Operating Systems are you using?

  • [ ] Linux
  • [ ] macOS
  • [X] Windows

Python Version

3.9

axolotl branch-commit

main

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this bug has not been reported yet.
  • [X] I am using the latest version of axolotl.
  • [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

ErikTromp avatar Dec 07 '23 17:12 ErikTromp