求教 AttributeError: type object 'QuantLinear' has no attribute '_old_init' moss-moon-003-sft-int8
AttributeError: type object 'QuantLinear' has no attribute '_old_init'
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
library_or_version
0.6.5
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
Traceback (most recent call last):
File "/mmm/MOSS/finetune_moss.py", line 311, in <module>
train(args)
File "/mmm/MOSS/finetune_moss.py", line 184, in train
model = MossForCausalLM.from_pretrained(args.model_path, use_cache=False)
File "/mmm/condaEnvs/mossChat/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2498, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/mmm/condaEnvs/mossChat/lib/python3.9/site-packages/transformers/utils/generic.py", line 359, in __exit__
self.stack.__exit__(*args, **kwargs)
File "/mmm/condaEnvs/mossChat/lib/python3.9/contextlib.py", line 513, in __exit__
raise exc_details[1]
File "/mmm/condaEnvs/mossChat/lib/python3.9/contextlib.py", line 498, in __exit__
if cb(*exc_details):
File "/mmm/condaEnvs/mossChat/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 444, in __exit__
_disable_class(subclass)
File "/mmm/condaEnvs/mossChat/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 440, in _disable_class
cls.__init__ = cls._old_init
AttributeError: type object 'QuantLinear' has no attribute '_old_init'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 97460) of binary: /mmm/condaEnvs/mossChat/bin/python
Traceback (most recent call last):
File "/mmm/condaEnvs/mossChat/bin/accelerate", line 9, in <module>
sys.exit(main())
File "/mmm/condaEnvs/mossChat/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/mmm/condaEnvs/mossChat/lib/python3.9/site-packages/accelerate/commands/launch.py", line 900, in launch_command
deepspeed_launcher(args)
File "/mmm/condaEnvs/mossChat/lib/python3.9/site-packages/accelerate/commands/launch.py", line 643, in deepspeed_launcher
distrib_run.run(args)
File "/mmm/condaEnvs/mossChat/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/mmm/condaEnvs/mossChat/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/mmm/condaEnvs/mossChat/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
finetune_moss.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-04-29_00:18:31
host : 7ubf2fimgar9m-0
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 97460)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
==========================================================
运行
cd /mmm/MOSS/
conda activate mossChat
out_dir=/mmm/trainOut/MOSS
model_name=moss-moon-003-sft-int8
/mmm/condaEnvs/mossChat/bin/accelerate launch \
--config_file ./configs/sft.yaml \
--deepspeed_multinode_launcher standard finetune_moss.py \
--model_path /mmm/model/fnlp/$model_name \
--data_dir ./sft_data \
--output_dir $out_dir/ckpts/$model_name \
--log_dir $out_dir/train_logs/$model_name \
--n_epochs 2 \
--train_bsz_per_gpu 4 \
--eval_bsz_per_gpu 4 \
--learning_rate 0.000015 \
--eval_step 200 \
--save_step 2000
command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false
文档:To avoid this warning pass in values... 链接:http://note.youdao.com/noteshare?id=29a328126d373317104d88178792a249&sub=4644DD40C0534D42B800AC1291018F74
accelerate 0.18.0 deepspeed 0.9.1 huggingface-hub 0.14.1 transformers 4.25.1 torch 2.0.0
(/j05025/condaEnvs/moss2) root@444d5q78sbnco-0:/j05025/MOSS# python --version Python 3.9.2
(/mmm/condaEnvs/moss2) root@444d5q78sbnco-0:/mmm/MOSS# pip list Package Version
absl-py 1.4.0 accelerate 0.18.0 aiohttp 3.8.4 aiosignal 1.3.1 async-timeout 4.0.2 attrs 23.1.0 cachetools 5.3.0 certifi 2022.12.7 charset-normalizer 3.1.0 cmake 3.26.3 contourpy 1.0.7 cycler 0.11.0 datasets 2.11.0 deepspeed 0.9.1 dill 0.3.6 filelock 3.12.0 fonttools 4.39.3 frozenlist 1.3.3 fsspec 2023.4.0 google-auth 2.17.3 google-auth-oauthlib 1.0.0 grpcio 1.54.0 hjson 3.1.0 huggingface-hub 0.14.1 idna 3.4 importlib-metadata 6.6.0 importlib-resources 5.12.0 Jinja2 3.1.2 kiwisolver 1.4.4 lit 16.0.2 Markdown 3.4.3 MarkupSafe 2.1.2 matplotlib 3.7.1 mpmath 1.3.0 multidict 6.0.4 multiprocess 0.70.14 networkx 3.1 ninja 1.11.1 numpy 1.24.3 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 oauthlib 3.2.2 packaging 23.1 pandas 2.0.1 Pillow 9.5.0 pip 23.0.1 protobuf 4.22.3 psutil 5.9.5 py-cpuinfo 9.0.0 pyarrow 11.0.0 pyasn1 0.5.0 pyasn1-modules 0.3.0 pydantic 1.10.7 pyparsing 3.0.9 python-dateutil 2.8.2 pytz 2023.3 PyYAML 6.0 regex 2023.3.23 requests 2.29.0 requests-oauthlib 1.3.1 responses 0.18.0 rsa 4.9 sentencepiece 0.1.98 setuptools 66.0.0 six 1.16.0 sympy 1.11.1 tensorboard 2.12.2 tensorboard-data-server 0.7.0 tensorboard-plugin-wit 1.8.1 tokenizers 0.13.3 torch 2.0.0 tqdm 4.65.0 transformers 4.25.1 triton 2.0.0 typing_extensions 4.5.0 tzdata 2023.3 urllib3 1.26.15 Werkzeug 2.3.1 wheel 0.38.4 xxhash 3.2.0 yarl 1.9.2 zipp 3.15.0