baize-chatbot run app.py error

Hello, when I run demo/app.py with 7B model, I got this problem 'addmm_impl_cpu_" not implemented for 'Half'. Could you please tell me how to fix it? This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces Traceback (most recent call last): File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/routes.py", line 393, in run_predict output = await app.get_blocks().process_api( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/blocks.py", line 1069, in process_api result = await self.call_function( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/blocks.py", line 892, in call_function prediction = await anyio.to_thread.run_sync( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/gradio/utils.py", line 549, in async_iteration return next(iterator) File "app.py", line 43, in predict for x in greedy_search(input_ids,model,tokenizer,stop_words=["[|Human|]", "[|AI|]"],max_length=max_length_tokens,temperature=temperature,top_p=top_p): File "/media/hlt/disk/chenyang_space/chenyang_space/xhd_space/baize-main/demo/app_modules/utils.py", line 253, in greedy_search outputs = model(input_ids) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/peft/peft_model.py", line 575, in forward return self.base_model( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward outputs = self.model( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward layer_outputs = decoder_layer( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 196, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in call_impl return forward_call(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/chenyang/anaconda3/envs/xhd/lib/python3.8/site-packages/peft/tuners/lora.py", line 406, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias) RuntimeError: "addmm_impl_cpu" not implemented for 'Half'

Apr 05 '23 02:04 XvHaidong

Got this error too on macbook m1, please help, thanks~

Apr 05 '23 03:04 hecor

Fix done, please check again.

Apr 05 '23 04:04 guoday

Great，thanks

Apr 07 '23 15:04 hecor

But it was very slow to generate reply on macbook m1, nearly 1 word every 1 minute, does any parameters can change this ?

Apr 07 '23 15:04 hecor

You need to use GPU. It's so slow if you use CPU

Apr 10 '23 01:04 guoday

got it, thanks~

Apr 13 '23 01:04 hecor

Hi, I run demo/app.py on the remote server with 7B mode, with output in the terminal: Reloading javascript... Running on local URL: http://127.0.0.1:7860

but it can't work on local chrom using the url.

Apr 18 '23 02:04 zay95

Set share=True in app.py and use public URL.

Apr 18 '23 07:04 guoday