LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

关于华为计算中心的昇腾设备无法运行本项目

Open shaoyuan-bai opened this issue 9 months ago • 1 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

(PyTorch-2.1.0) [ma-user LLaMA-Factory]$python src/train_web.py /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch_npu/dynamo/init.py:18: UserWarning: Register eager implementation for the 'npu' backend of dynamo, as torch_npu was not compiled with torchair. warnings.warn( Warning : ASCEND_HOME_PATH environment variable is not set. /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/pydantic/_internal/_config.py:334: UserWarning: Valid config keys have changed in V2:

  • 'allow_population_by_field_name' has been renamed to 'populate_by_name'
  • 'validate_all' has been renamed to 'validate_default' warnings.warn(message, UserWarning) /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/pydantic/_internal/fields.py:160: UserWarning: Field "model_persistence_threshold" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = (). warnings.warn( Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/trl/import_utils.py", line 176, in _get_module return importlib.import_module("." + module_name, self.name) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 680, in _load_unlocked File "", line 850, in exec_module File "", line 228, in _call_with_frames_removed File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/trl/trainer/dpo_trainer.py", line 62, in import deepspeed File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/init.py", line 16, in from . import module_inject File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/module_inject/init.py", line 6, in from .replace_module import replace_transformer_layer, revert_transformer_layer, ReplaceWithTensorSlicing, GroupQuantizer, generic_injection File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 731, in from ..pipe import PipelineModule File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/pipe/init.py", line 6, in from ..runtime.pipe import PipelineModule, LayerSpec, TiedLayerSpec File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/runtime/pipe/init.py", line 6, in from .module import PipelineModule, LayerSpec, TiedLayerSpec File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/runtime/pipe/module.py", line 19, in from ..activation_checkpointing import checkpointing File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 25, in from deepspeed.runtime.config import DeepSpeedConfig File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/runtime/config.py", line 28, in from .zero.config import get_zero_config, ZeroStageEnum File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/runtime/zero/init.py", line 6, in from .partition_parameters import ZeroParamType File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 603, in class Init(InsertPostInitMethodToModuleSubClasses): File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 605, in Init param_persistence_threshold = get_config_default(DeepSpeedZeroConfig, "param_persistence_threshold") File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/deepspeed/runtime/config_utils.py", line 115, in get_config_default assert not config.fields.get( AttributeError: 'FieldInfo' object has no attribute 'required'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ma-user/work/LLaMA-Factory/src/train_web.py", line 1, in from llmtuner import create_ui File "/home/ma-user/work/LLaMA-Factory/src/llmtuner/init.py", line 6, in from .train import export_model, run_exp File "/home/ma-user/work/LLaMA-Factory/src/llmtuner/train/init.py", line 1, in from .tuner import export_model, run_exp File "/home/ma-user/work/LLaMA-Factory/src/llmtuner/train/tuner.py", line 11, in from .dpo import run_dpo File "/home/ma-user/work/LLaMA-Factory/src/llmtuner/train/dpo/init.py", line 1, in from .workflow import run_dpo File "/home/ma-user/work/LLaMA-Factory/src/llmtuner/train/dpo/workflow.py", line 11, in from .trainer import CustomDPOTrainer File "/home/ma-user/work/LLaMA-Factory/src/llmtuner/train/dpo/trainer.py", line 8, in from trl import DPOTrainer File "", line 1055, in _handle_fromlist File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/trl/import_utils.py", line 167, in getattr value = getattr(module, name) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/trl/import_utils.py", line 166, in getattr module = self._get_module(self._class_to_module[name]) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/trl/import_utils.py", line 178, in _get_module raise RuntimeError( RuntimeError: Failed to import trl.trainer.dpo_trainer because of the following error (look up to see its traceback): 'FieldInfo' object has no attribute 'required'

Expected behavior

最初我安装的是目前的最新版本。然而在运行启动命令后报ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. deepspeed 0.9.2 requires pydantic<2.0.0, but you have pydantic 2.7.1 which is incompatible. 接着运行了pip install --no-deps -e .运行启动命令后报RuntimeError: Failed to import trl.trainer.dpo_trainer because of the following error (look up to see its traceback): 'FieldInfo' object has no attribute 'required' 后来尝试调整各个包的版本,依旧不行。

System Info

(PyTorch-2.1.0) [ma-user LLaMA-Factory]$transformers-cli env /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch_npu/dynamo/init.py:18: UserWarning: Register eager implementation for the 'npu' backend of dynamo, as torch_npu was not compiled with torchair. warnings.warn( Warning : ASCEND_HOME_PATH environment variable is not set.

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • transformers version: 4.40.2
  • Platform: Linux-4.19.36-vhulk1907.1.0.h619.eulerosv2r8.aarch64-aarch64-with-glibc2.28
  • Python version: 3.9.18
  • Huggingface_hub version: 0.23.0
  • Safetensors version: 0.4.3
  • Accelerate version: 0.30.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.0 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Others

No response

shaoyuan-bai avatar May 11 '24 01:05 shaoyuan-bai

昇腾相关用户可以加入这个群做进一步交流

20240511-094900

codemayq avatar May 11 '24 01:05 codemayq

升级了新的版本之后我也遇到同样的问题,目前使用的是A100芯片,1机8卡,请问这个问题已经解决了么?

yx9966 avatar May 17 '24 09:05 yx9966

我没有解决。之前希望在计算中心上使用webui,但华为的技术和我说只有裸金属服务器才能打开对应的端口。解决了包冲突之后就没再研究了

yx9966 @.***>于2024年5月17日 周五17:47写道:

升级了新的版本之后我也遇到同样的问题,目前使用的是A100芯片,1机8卡,请问这个问题已经解决了么?

— Reply to this email directly, view it on GitHub https://github.com/hiyouga/LLaMA-Factory/issues/3684#issuecomment-2117167153, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7EZL2EOQ36HBHI2SYO2BYDZCXG2LAVCNFSM6AAAAABHRPH2NSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXGE3DOMJVGM . You are receiving this because you authored the thread.Message ID: @.***>

shaoyuan-bai avatar May 17 '24 09:05 shaoyuan-bai

请问你是哪个包冲突了?pydantic这个是安装的什么版本啊?

yx9966 avatar May 17 '24 09:05 yx9966

我下班了,在车上开电脑比较麻烦。印象中我把deepspeed卸载掉就正常了

yx9966 @.***>于2024年5月17日 周五17:51写道:

请问你是哪个包冲突了?pydantic这个是安装的什么版本啊?

— Reply to this email directly, view it on GitHub https://github.com/hiyouga/LLaMA-Factory/issues/3684#issuecomment-2117174960, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7EZL2CDLIBMAGZFNSXKNVTZCXHKBAVCNFSM6AAAAABHRPH2NSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXGE3TIOJWGA . You are receiving this because you authored the thread.Message ID: @.***>

shaoyuan-bai avatar May 17 '24 09:05 shaoyuan-bai

收到谢谢我也试下

yx9966 avatar May 17 '24 09:05 yx9966

群二维码能不能发我一下

bltcn avatar May 27 '24 15:05 bltcn

请问还能加群吗?

yukiwayx avatar Jun 06 '24 07:06 yukiwayx

还能加群吗@codemayq

Damonpkl avatar Jun 26 '24 03:06 Damonpkl