Accelerate 0.30.0 Breaks FSDP QLora
System Info
See below a pip list output that does not work:
Package Version
------------------------ ---------------
accelerate 0.30.0
aiohttp 3.9.5
aiosignal 1.3.1
annotated-types 0.6.0
async-timeout 4.0.3
attrs 23.2.0
bitsandbytes 0.43.1
certifi 2024.2.2
charset-normalizer 3.3.2
click 8.1.7
datasets 2.19.1
deepspeed 0.14.2+5f631abc
dill 0.3.8
docker-pycreds 0.4.0
docstring_parser 0.16
einops 0.8.0
eval_type_backport 0.2.0
exceptiongroup 1.2.1
filelock 3.14.0
flash-attn 2.5.8
frozenlist 1.4.1
fsspec 2024.3.1
gitdb 4.0.11
GitPython 3.1.43
hf_transfer 0.1.6
hjson 3.1.0
huggingface-hub 0.23.0
idna 3.7
iniconfig 2.0.0
Jinja2 3.1.4
markdown-it-py 3.0.0
MarkupSafe 2.1.5
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.1
ninja 1.11.1.1
numpy 1.24.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
packaging 24.0
pandas 2.0.3
peft 0.10.0
pillow 10.3.0
pip 24.0
platformdirs 4.2.1
pluggy 1.5.0
protobuf 3.20.1
psutil 5.9.8
py-cpuinfo 9.0.0
pyarrow 16.0.0
pyarrow-hotfix 0.6
pydantic 2.7.1
pydantic_core 2.18.2
Pygments 2.18.0
pynvml 11.5.0
pytest 8.2.0
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.1
regex 2024.5.10
requests 2.31.0
rich 13.7.1
safetensors 0.4.3
scipy 1.10.1
sentencepiece 0.2.0
sentry-sdk 2.1.1
setproctitle 1.3.3
setuptools 69.5.1
shtab 1.7.1
six 1.16.0
smmap 5.0.1
sympy 1.12
text-generation 0.7.0
tokenizers 0.19.1
tomli 2.0.1
torch 2.3.0
torchaudio 2.3.0
torchvision 0.18.0
tqdm 4.66.4
transformers 4.40.2
triton 2.3.0
trl 0.8.6
typing_extensions 4.11.0
tyro 0.8.4
tzdata 2024.1
urllib3 2.2.1
wandb 0.17.0
wheel 0.43.0
xxhash 3.4.1
yarl 1.9.4
Changing accelerate to accelerate<=0.29.3:
Package Version
------------------------ ---------------
accelerate 0.29.3
aiohttp 3.9.5
aiosignal 1.3.1
annotated-types 0.6.0
async-timeout 4.0.3
attrs 23.2.0
bitsandbytes 0.43.1
certifi 2024.2.2
charset-normalizer 3.3.2
click 8.1.7
datasets 2.19.1
deepspeed 0.14.2+5f631abc
dill 0.3.8
docker-pycreds 0.4.0
docstring_parser 0.16
einops 0.8.0
eval_type_backport 0.2.0
exceptiongroup 1.2.1
filelock 3.14.0
flash-attn 2.5.8
frozenlist 1.4.1
fsspec 2024.3.1
gitdb 4.0.11
GitPython 3.1.43
hf_transfer 0.1.6
hjson 3.1.0
huggingface-hub 0.23.0
idna 3.7
iniconfig 2.0.0
Jinja2 3.1.4
markdown-it-py 3.0.0
MarkupSafe 2.1.5
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.1
ninja 1.11.1.1
numpy 1.24.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
packaging 24.0
pandas 2.0.3
peft 0.10.0
pillow 10.3.0
pip 24.0
platformdirs 4.2.1
pluggy 1.5.0
protobuf 3.20.1
psutil 5.9.8
py-cpuinfo 9.0.0
pyarrow 16.0.0
pyarrow-hotfix 0.6
pydantic 2.7.1
pydantic_core 2.18.2
Pygments 2.18.0
pynvml 11.5.0
pytest 8.2.0
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.1
regex 2024.5.10
requests 2.31.0
rich 13.7.1
safetensors 0.4.3
scipy 1.10.1
sentencepiece 0.2.0
sentry-sdk 2.1.1
setproctitle 1.3.3
setuptools 69.5.1
shtab 1.7.1
six 1.16.0
smmap 5.0.1
sympy 1.12
text-generation 0.7.0
tokenizers 0.19.1
tomli 2.0.1
torch 2.3.0
torchaudio 2.3.0
torchvision 0.18.0
tqdm 4.66.4
transformers 4.40.2
triton 2.3.0
trl 0.8.6
typing_extensions 4.11.0
tyro 0.8.4
tzdata 2024.1
urllib3 2.2.1
wandb 0.17.0
wheel 0.43.0
xxhash 3.4.1
yarl 1.9.4
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
I am using code based on the code here: https://github.com/mallorbc/Finetune_LLMs
Else, the basic steps are the following:
- Install the pip packages seen above, namely: pip install "accelerate<=0.29.3" pip install transformers accelerate peft bitsandbytes trl
- Use a QLora FSDP program
- Notice how errors occur with 0.3.0 but not 0.29.3
See an error like the following for 0.30.0:
[rank0]: Traceback (most recent call last):
[rank0]: File "trl_finetune.py", line 387, in <module>
[rank0]: trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 361, in train
[rank0]: output = super().train(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1859, in train
[rank0]: return inner_training_loop(
[rank0]: File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 2001, in _inner_training_loop
[rank0]: self._fsdp_qlora_plugin_updates()
[rank0]: File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 4425, in _fsdp_qlora_plugin_updates
[rank0]: fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(self.model)
[rank0]: File "/usr/local/lib/python3.8/dist-packages/peft/utils/other.py", line 396, in fsdp_auto_wrap_policy
[rank0]: transformer_cls = FullyShardedDataParallelPlugin.get_module_class_from_name(model, layer_class)
[rank0]: AttributeError: type object 'FullyShardedDataParallelPlugin' has no attribute 'get_module_class_from_name'
[rank1]: Traceback (most recent call last):
[rank1]: File "trl_finetune.py", line 387, in <module>
[rank1]: trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
[rank1]: File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 361, in train
[rank1]: output = super().train(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1859, in train
[rank1]: return inner_training_loop(
[rank1]: File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 2001, in _inner_training_loop
[rank1]: self._fsdp_qlora_plugin_updates()
[rank1]: File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 4425, in _fsdp_qlora_plugin_updates
[rank1]: fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(self.model)
[rank1]: File "/usr/local/lib/python3.8/dist-packages/peft/utils/other.py", line 396, in fsdp_auto_wrap_policy
[rank1]: transformer_cls = FullyShardedDataParallelPlugin.get_module_class_from_name(model, layer_class)
[rank1]: AttributeError: type object 'FullyShardedDataParallelPlugin' has no attribute 'get_module_class_from_name'
E0510 12:16:25.853937 140644343273280 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 140) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1069, in launch_command
multi_gpu_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 718, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
trl_finetune.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2024-05-10_12:16:25
host : f61090d2a6fd
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 141)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-05-10_12:16:25
host : f61090d2a6fd
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 140)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Expected behavior
I expect training to occur without issues. This occurs when I use accelerate 0.29.3
cc @younesbelkada @pacman100
@mallorbc Could you try installing PEFT from main and check if the error persists?
So use latest accelerate and install peft from main?
I will do the following: pip install transformers bitsandbytes trl accelerate pip install git+https://github.com/huggingface/peft.git
I will let you know
I did the above setup. Here is my pip list: Package Version
accelerate 0.30.1 aiohttp 3.9.5 aiosignal 1.3.1 annotated-types 0.6.0 async-timeout 4.0.3 attrs 23.2.0 bitsandbytes 0.43.1 certifi 2024.2.2 charset-normalizer 3.3.2 click 8.1.7 datasets 2.19.1 deepspeed 0.14.2+5f631abc dill 0.3.8 docker-pycreds 0.4.0 docstring_parser 0.16 einops 0.8.0 eval_type_backport 0.2.0 exceptiongroup 1.2.1 filelock 3.14.0 flash-attn 2.5.8 frozenlist 1.4.1 fsspec 2024.3.1 gitdb 4.0.11 GitPython 3.1.43 hf_transfer 0.1.6 hjson 3.1.0 huggingface-hub 0.23.0 idna 3.7 iniconfig 2.0.0 Jinja2 3.1.4 markdown-it-py 3.0.0 MarkupSafe 2.1.5 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 networkx 3.1 ninja 1.11.1.1 numpy 1.24.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.1.105 packaging 24.0 pandas 2.0.3 peft 0.11.1.dev0 pillow 10.3.0 pip 24.0 platformdirs 4.2.2 pluggy 1.5.0 protobuf 3.20.1 psutil 5.9.8 py-cpuinfo 9.0.0 pyarrow 16.1.0 pyarrow-hotfix 0.6 pydantic 2.7.1 pydantic_core 2.18.2 Pygments 2.18.0 pynvml 11.5.0 pytest 8.2.0 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.1 regex 2024.5.15 requests 2.31.0 rich 13.7.1 safetensors 0.4.3 scipy 1.10.1 sentencepiece 0.2.0 sentry-sdk 2.2.0 setproctitle 1.3.3 setuptools 69.5.1 shtab 1.7.1 six 1.16.0 smmap 5.0.1 sympy 1.12 text-generation 0.7.0 tokenizers 0.19.1 tomli 2.0.1 torch 2.3.0 torchaudio 2.3.0 torchvision 0.18.0 tqdm 4.66.4 transformers 4.40.2 triton 2.3.0 trl 0.8.6 typing_extensions 4.11.0 tyro 0.8.4 tzdata 2024.1 urllib3 2.2.1 wandb 0.17.0 wheel 0.43.0 xxhash 3.4.1 yarl 1.9.4
I can confirm that this lead to successful fine-tuning with QLora with FSDP. However, QDora seems to be broken.
When I try doing FSDP QDora, I get the following issue:
[rank0]: Traceback (most recent call last):
[rank0]: File "trl_finetune.py", line 399, in
ith QLora with FS
I used the exactly version you mentioned ,and with fsdp+qlora, i got the same "ValueError: Cannot flatten integer dtype tensors"
For QLoRA training with FSDP, please check the updated bitsandbytes docs.
As for QDoRA: Training with FSDP should be fixed in https://github.com/huggingface/peft/pull/1806. If you install from the latest PEFT main, it should thus work. Please also check the PR description on how this was tested.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.