LLaMA-Factory
LLaMA-Factory copied to clipboard
FSDP QDoRa
Reminder
- [x] I have read the README and searched the existing issues.
Reproduction
Is LLaMa-Factory capable of FSDP QDoRa described here: https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html It seems promising and beating even full fine tuning! I would love to continue using LLaMa-Factory and not change my scripts..
Expected behavior
Could LLaMa-Factory support FSDP QDora?
System Info
No response
Others
No response
Simply add --use_dora True
to this script https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/extras/fsdp_qlora/sft.sh
Simply add
--use_dora True
to this script https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/extras/fsdp_qlora/sft.sh
hi, can QDora work with fsdp_offload_params
. When try FSDP + QDora + Offload with Qwen 1.5 72B:
ValueError: Expected a cuda device, but got: cpu
Thanks.
I got the same error when I tried --use_dora True
.
I got the same error when I tried
--use_dora True
.
https://github.com/huggingface/peft/pull/1724
I added use_dora: true to the yaml. It said "Cannot flatten integer dtype tensors":
version 0.8.0
Thanks!
@etemiz what is your bitsandbytes version?
bitsandbytes 0.43.1
@etemiz try
pip uninstall peft
pip install git+https://github.com/huggingface/peft.git
https://github.com/huggingface/peft/pull/1806
Hello, was just trying this out as well; Using the latest peft
as suggested gets rid of the "cannot flatten integer dtype tensors" error. However, a new error now shows up when the training starts:
[rank0]: File "LLaMA-Factory/src/train.py", line 28, in <module>
[rank0]: main()
[rank0]: File "LLaMA-Factory/src/train.py", line 19, in main
[rank0]: run_exp()
[rank0]: File "LLaMA-Factory/src/llamafactory/train/tuner.py", line 45, in run_exp
[rank0]: run_pt(model_args, data_args, training_args, finetuning_args, callbacks)
[rank0]: File "LLaMA-Factory/src/llamafactory/train/pt/workflow.py", line 62, in run_pt
[rank0]: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 1885, in train
[rank0]: return inner_training_loop(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 3238, in training_step
[rank0]: loss = self.compute_loss(model, inputs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 3264, in compute_loss
[rank0]: outputs = model(**inputs)
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
[rank0]: return model_forward(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
[rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
[rank0]: output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
[rank0]: return model_forward(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
[rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/peft/peft_model.py", line 1501, in forward
[rank0]: return self.base_model(
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
[rank0]: return self.model.forward(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
[rank0]: outputs = self.model(
[rank0]: ^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 957, in forward
[rank0]: layer_outputs = self._gradient_checkpointing_func(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "LLaMA-Factory/src/llamafactory/model/model_utils/checkpointing.py", line 65, in custom_gradient_checkpointing_func
[rank0]: return gradient_checkpointing_func(func, *args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/_compile.py", line 24, in inner
[rank0]: return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
[rank0]: return CheckpointFunction.apply(function, preserve, *args)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/autograd/function.py", line 598, in apply
[rank0]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 262, in forward
[rank0]: outputs = run_function(*args)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
[rank0]: output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 713, in forward
[rank0]: hidden_states, self_attn_weights, present_key_value = self.self_attn(
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 416, in forward
[rank0]: query_states = self.q_proj(hidden_states)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/lora/bnb.py", line 492, in forward
[rank0]: output = self.lora_magnitude_vector[active_adapter](
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward
[rank0]: output = self._fsdp_wrapped_module(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/lora/dora.py", line 72, in forward
[rank0]: x_eye = torch.eye(lora_A.weight.shape[1], device=lora_A.weight.device, dtype=x.dtype)
[rank0]: ~~~~~~~~~~~~~~~~~~~^^^
[rank0]: IndexError: tuple index out of range
Any suggestions? looks like the dora code assumes lora parameters will be sent with a certain shape, but they are not being passed as expected.
Tried these
pip uninstall peft
pip install git+https://github.com/huggingface/peft.git
and getting the same error:
rank1]: File "...../LLaMA-Factory/v/lib/python3.11/site-packages/peft/tuners/lora/dora.py", line 74, in forward
[rank1]: x_eye = torch.eye(lora_A.weight.shape[1], device=lora_A.weight.device, dtype=x.dtype)
[rank1]: ~~~~~~~~~~~~~~~~~~~^^^
[rank1]: IndexError: tuple index out of range
peft 0.11.2.dev0 (latest on github) bitsandbytes 0.43.1 LLaMA-Factory latest on github model Llama3-70B
I was using fsdp_qlora for a while. It works well. Thanks for this amazing software. Now I tried to do qdora. It didn't work.
Hello, was just trying this out as well; Using the latest
peft
as suggested gets rid of the "cannot flatten integer dtype tensors" error. However, a new error now shows up when the training starts:[rank0]: File "LLaMA-Factory/src/train.py", line 28, in <module> [rank0]: main() [rank0]: File "LLaMA-Factory/src/train.py", line 19, in main [rank0]: run_exp() [rank0]: File "LLaMA-Factory/src/llamafactory/train/tuner.py", line 45, in run_exp [rank0]: run_pt(model_args, data_args, training_args, finetuning_args, callbacks) [rank0]: File "LLaMA-Factory/src/llamafactory/train/pt/workflow.py", line 62, in run_pt [rank0]: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 1885, in train [rank0]: return inner_training_loop( [rank0]: ^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop [rank0]: tr_loss_step = self.training_step(model, inputs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 3238, in training_step [rank0]: loss = self.compute_loss(model, inputs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/trainer.py", line 3264, in compute_loss [rank0]: outputs = model(**inputs) [rank0]: ^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward [rank0]: return model_forward(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__ [rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs)) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast [rank0]: return func(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward [rank0]: output = self._fsdp_wrapped_module(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward [rank0]: return model_forward(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__ [rank0]: return convert_to_fp32(self.model_forward(*args, **kwargs)) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast [rank0]: return func(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/peft/peft_model.py", line 1501, in forward [rank0]: return self.base_model( [rank0]: ^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 179, in forward [rank0]: return self.model.forward(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward [rank0]: outputs = self.model( [rank0]: ^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 957, in forward [rank0]: layer_outputs = self._gradient_checkpointing_func( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "LLaMA-Factory/src/llamafactory/model/model_utils/checkpointing.py", line 65, in custom_gradient_checkpointing_func [rank0]: return gradient_checkpointing_func(func, *args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/_compile.py", line 24, in inner [rank0]: return torch._dynamo.disable(fn, recursive)(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank0]: return fn(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/_dynamo/external_utils.py", line 36, in inner [rank0]: return fn(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint [rank0]: return CheckpointFunction.apply(function, preserve, *args) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/autograd/function.py", line 598, in apply [rank0]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 262, in forward [rank0]: outputs = run_function(*args) [rank0]: ^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward [rank0]: output = self._fsdp_wrapped_module(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 713, in forward [rank0]: hidden_states, self_attn_weights, present_key_value = self.self_attn( [rank0]: ^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 416, in forward [rank0]: query_states = self.q_proj(hidden_states) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/lora/bnb.py", line 492, in forward [rank0]: output = self.lora_magnitude_vector[active_adapter]( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 857, in forward [rank0]: output = self._fsdp_wrapped_module(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "envs/llama-factory/lib/python3.11/site-packages/peft/tuners/lora/dora.py", line 72, in forward [rank0]: x_eye = torch.eye(lora_A.weight.shape[1], device=lora_A.weight.device, dtype=x.dtype) [rank0]: ~~~~~~~~~~~~~~~~~~~^^^ [rank0]: IndexError: tuple index out of range
Any suggestions? looks like the dora code assumes lora parameters will be sent with a certain shape, but they are not being passed as expected.
Same problem. Have you solved it?
Tried these
pip uninstall peft pip install git+https://github.com/huggingface/peft.git
and getting the same error:
rank1]: File "...../LLaMA-Factory/v/lib/python3.11/site-packages/peft/tuners/lora/dora.py", line 74, in forward [rank1]: x_eye = torch.eye(lora_A.weight.shape[1], device=lora_A.weight.device, dtype=x.dtype) [rank1]: ~~~~~~~~~~~~~~~~~~~^^^ [rank1]: IndexError: tuple index out of range
peft 0.11.2.dev0 (latest on github) bitsandbytes 0.43.1 LLaMA-Factory latest on github model Llama3-70B
I was using fsdp_qlora for a while. It works well. Thanks for this amazing software. Now I tried to do qdora. It didn't work.
Hi there, have you ever solved it? I have the same problem.