axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

AttributeError: 'Linear4bit' object has no attribute 'weight' with relora

Open ErikTromp opened this issue 1 year ago • 7 comments

Please check that this issue hasn't been reported before.

  • [X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Training a model with relora works without error

Current behaviour

I trained a model on a dataset using relora and after training finished, got this error

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 133/133 [2:14:49<00:00, 49.16s/it]Traceback (most recent call last): File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/erik/axolotl/src/axolotl/cli/train.py", line 38, in fire.Fire(do_cli) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/erik/axolotl/src/axolotl/cli/train.py", line 34, in do_cli train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta) File "/home/erik/axolotl/src/axolotl/train.py", line 124, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer.py", line 1555, in train return inner_training_loop( File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer.py", line 1998, in _inner_training_loop self.control = self.callback_handler.on_train_end(args, self.state, self.control) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer_callback.py", line 366, in on_train_end return self.call_event("on_train_end", args, state, control) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/transformers/trainer_callback.py", line 407, in call_event result = getattr(callback, event)( File "/home/erik/axolotl/src/axolotl/monkeypatch/relora.py", line 178, in on_train_end merge_and_save( File "/home/erik/axolotl/src/axolotl/monkeypatch/relora.py", line 337, in merge_and_save old_dev = target.weight.device File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'Linear4bit' object has no attribute 'weight' 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 133/133 [2:14:51<00:00, 60.84s/it] Traceback (most recent call last): File "/home/erik/anaconda3/envs/llama2-py39/bin/accelerate", line 8, in sys.exit(main()) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 994, in launch_command simple_launcher(args) File "/home/erik/anaconda3/envs/llama2-py39/lib/python3.9/site-packages/accelerate/commands/launch.py", line 636, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/erik/anaconda3/envs/llama2-py39/bin/python', '-m', 'axolotl.cli.train', 'examples/llama-2-nl/relora.yml']' returned non-zero exit status 1.

Steps to reproduce

Run relora on WSL2

Config yaml

base_model: NousResearch/Llama-2-7b-hf model_type: LlamaForCausalLM tokenizer_type: LlamaTokenizer is_llama_derived_model: true

load_in_8bit: false load_in_4bit: true strict: false

datasets:

  • path: UnderstandLing/oasst1_nl type: oasst dataset_prepared_path: val_set_size: 0.05 output_dir: ./relora-out

adapter: qlora lora_model_dir:

sequence_len: 2048 sample_packing: true pad_to_sequence_len: true

lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: lora_target_linear: true lora_fan_in_fan_out:

relora_steps: 150 relora_warmup_steps: 10 relora_cpu_offload: false

wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model:

gradient_accumulation_steps: 4 micro_batch_size: 4 num_epochs: 1 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.0002

train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false

gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true

warmup_steps: 10 eval_steps: 0.05 save_steps: 50 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: "" eos_token: "" unk_token: ""

Possible solution

No response

Which Operating Systems are you using?

  • [ ] Linux
  • [ ] macOS
  • [X] Windows

Python Version

3.9

axolotl branch-commit

main

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this bug has not been reported yet.
  • [X] I am using the latest version of axolotl.
  • [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

ErikTromp avatar Dec 07 '23 17:12 ErikTromp

Hey, I'm not so familiar with relora, but let me list out some debug tips.

  1. Have you tried running the default relora example? Does it work?
  2. Have you tried running this in docker?
  3. Could you try without relora? (remove those relora configs)

NanoCode012 avatar Dec 08 '23 06:12 NanoCode012

ReLoRA fails with the same error on the standard llama-7b example too (the only example with relora). Running normal lora on the same datasets/parameters works fine.

ErikTromp avatar Dec 08 '23 13:12 ErikTromp

It seems the src/axolotl/monkeypatch/relora.py is incompatible with current PEFT 0.6.0. I searched PEFT release history and didn't find which PEFT version it supports.

So I create a temporary workaround to fix the problem. Not it only support the ReLoRA with QLoRA in 4bit mode. I didn't test 8bit LoRA mode or others.

patch.txt

wangqi avatar Dec 08 '23 19:12 wangqi

Would 0.5.0 work?

NanoCode012 avatar Dec 09 '23 03:12 NanoCode012

I checked 0.5.0 source codes. I'm afraid it's not compatible too.

wangqi avatar Dec 09 '23 03:12 wangqi

Looking at the file history, it was committed 4 months ago, so, maybe check which peft correlates to that time?

https://github.com/OpenAccess-AI-Collective/axolotl/blob/bde3c5a478100fd205822a139ec1c9cade73c9c1/requirements.txt

Unfortunately, that was before we started pinning versions.

NanoCode012 avatar Dec 09 '23 03:12 NanoCode012

For the code history #322, I can see the implementation was done at July 25. While the PEFT 4bit bnb was implemented at Aug 29 bnb.py. So my guess is the ReLoRA code didn't support 4bit QLoRA at its implementation.

wangqi avatar Dec 09 '23 03:12 wangqi