Asecend910A推理glm4-9b-chat出现 NPU function error

Open AlexYoung757 opened this issue 1 year ago • 0 comments

Reminder

[X] I have read the README and searched the existing issues.

System Info

llamafactory version: 0.8.3.dev0
Platform: Linux-4.19.36-vhulk1907.1.0.h619.eulerosv2r8.aarch64-aarch64-with-glibc2.28
Python version: 3.10.14
PyTorch version: 2.3.1 (NPU)
Transformers version: 4.41.2
Datasets version: 2.20.0
Accelerate version: 0.32.1
PEFT version: 0.11.1
TRL version: 0.9.6
NPU type: Ascend910PremiumA
CANN version: 8.0.RC2.alpha002

Reproduction

ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 llamafactory-cli chat glm4_9b_chat.yaml

Expected behavior

Traceback (most recent call last): File "/root/miniconda3/envs/llm/bin/llamafactory-cli", line 8, in sys.exit(main()) File "/root/project/LLaMA-Factory/src/llamafactory/cli.py", line 81, in main run_chat() File "/root/project/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 125, in run_chat chat_model = ChatModel() File "/root/project/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 44, in init self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args) File "/root/project/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 58, in init self.model = load_model( File "/root/project/LLaMA-Factory/src/llamafactory/model/loader.py", line 153, in load_model model = AutoModelForCausalLM.from_pretrained(**init_kwargs) File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained return model_class.from_pretrained( File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3754, in from_pretrained ) = cls._load_pretrained_model( File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4214, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 887, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 460, in set_module_tensor_to_device clear_device_cache() File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/memory.py", line 42, in clear_device_cache torch.npu.empty_cache() File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch_npu/npu/memory.py", line 144, in empty_cache torch_npu._C.npu_emptyCache() RuntimeError: unmapHandles:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:400 NPU function error: aclrtSynchronizeStream(stream), error code is 107003 [ERROR] 2024-07-15-16:44:48 (PID:159190, Device:0, RankID:-1) ERR00100 PTA call acl api failed [Error]: The stream is not in the current context. Check whether the context where the stream is located is the same as the current context. EE9999: Inner Error! EE9999: 2024-07-15-16:44:48.807.354 Stream synchronize failed, stream is not in current ctx, stream_id=2.[FUNC:StreamSynchronize][FILE:api_impl.cc][LINE:1005] TraceBack (most recent call last): rtStreamSynchronize execute failed, reason=[stream not in current context][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] synchronize stream failed, runtime result = 107003[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

Others

No response

Jul 15 '24 08:07 AlexYoung757