FastChat npu 910B run fastchat + baichuan-13B: DefaultCPUAllocator: can't allocate memory

npu 910B run fastchat + baichuan-13B: DefaultCPUAllocator: can't allocate memory

Open GuIIWen opened this issue 1 year ago • 0 comments

npu: 910B * 8 model : baichuan-13B torch: 2.1.0 torch_npu: 2.1.0 fastchat: 0.2.36 transformers: 4.43.3

i use the command "python3 -m fastchat.serve.cli --model-path baichuan-13b/ --device npu" to run fastchat serve, it started successfully,

You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() Loading checkpoint shards: 100%|███████████████████████████| 3/3 [00:06<00:00, 2.08s/it] Human: hello

but when i write the prompt, it raise RuntimeError: [enforce fail at alloc_cpu.cpp:83] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 4398046511104 bytes. Error code 12 (Cannot allocate memory) [ERROR] 2024-08-01-15:53:52 (PID:23205, Device:0, RankID:-1) ERR99999 UNKNOWN application exception

error log: Human: hello Assistant: Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/fastchat/serve/cli.py", line 304, in main(args) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/fastchat/serve/cli.py", line 227, in main chat_loop( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/fastchat/serve/inference.py", line 532, in chat_loop outputs = chatio.stream_output(output_stream) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/fastchat/serve/cli.py", line 63, in stream_output for outputs in output_stream: File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/fastchat/serve/inference.py", line 132, in generate_stream out = model(input_ids=start_ids, use_cache=True) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_baichuan.py", line 449, in forward outputs = self.model( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/modeling_baichuan.py", line 318, in forward alibi_mask = self.get_alibi_mask(inputs_embeds, seq_length_with_past) File "/root/.cache/huggingface/modules/transformers_modules/modeling_baichuan.py", line 274, in get_alibi_mask self.register_buffer("future_mask", _gen_alibi_mask(self.n_head, self.max_cache_pos).to(tensor), persistent=False) File "/root/.cache/huggingface/modules/transformers_modules/modeling_baichuan.py", line 46, in _gen_alibi_mask _fill_with_neg_inf(torch.zeros([max_pos, max_pos])), 1 RuntimeError: [enforce fail at alloc_cpu.cpp:83] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 4398046511104 bytes. Error code 12 (Cannot allocate memory) [ERROR] 2024-08-01-15:53:52 (PID:23205, Device:0, RankID:-1) ERR99999 UNKNOWN application exception

Aug 01 '24 08:08 GuIIWen

FastChat FastChat copied to clipboard

npu 910B run fastchat + baichuan-13B: DefaultCPUAllocator: can't allocate memory

FastChat
FastChat copied to clipboard