llama-recipes icon indicating copy to clipboard operation
llama-recipes copied to clipboard

Code Completion and Code Infilling - LLama 7B Inference

Open akaneshiro7 opened this issue 1 year ago • 7 comments

Hello,

I am trying to run inference for Code LLama using HuggingFace Transformers Accelerate Model for llama 2 7B. I am able to run inference with the examples/inference.py script. However, when I try to run the example command for code completion and code infilling, I get the following error:

TypeError: llama_forward() got an unexpected keyword argument 'padding_mask'

Warning: The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.

Command: python examples/code_llama/code_infilling_example.py --model_name ../hf_transformers/7B/ --prompt_file examples/code_llama/code_infilling_prompt.txt --temperature 0.2 --top_p 0.9

Versions: torch==2.0.1 transformers==4.34.1 tokenizers==0.14.1 optimum==1.13.2

akaneshiro7 avatar Oct 19 '23 17:10 akaneshiro7

I ran into this issue too. It's supposed to be fixed in the latest update of Transformers: https://github.com/huggingface/optimum/issues/1446

jxmorris12 avatar Oct 21 '23 22:10 jxmorris12

thanks @jxmorris12 , @akaneshiro7 seems the fix in optimum should help. Pls let me us know if that works for you.

HamidShojanazeri avatar Oct 22 '23 19:10 HamidShojanazeri

Can confirm it works after installing from source.

eduardosanchezg avatar Oct 23 '23 17:10 eduardosanchezg

@HamidShojanazeri

Yes, installing from source seems to have fixed the issue with running Code Completion and Code Infilling with the HuggingFace Accelerate Model. Am I able to run it with a PEFT Fine-Tuned Model? When I try to run the same command but passing in the peft_model, I get the following error:

File "/home/ubuntu/anaconda3/envs/vllm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1685, in delattr super().delattr(name) AttributeError: _hf_hook

akaneshiro7 avatar Oct 23 '23 18:10 akaneshiro7

I meet the issue "AttributeError: _hf_hook" too.

huangyangyu avatar Oct 24 '23 06:10 huangyangyu

It went away when I set 'use_fast_kernels=False'

I meet the issue "AttributeError: _hf_hook" too.

callumHub avatar Nov 29 '23 00:11 callumHub

I meet the issue "AttributeError: _hf_hook" too. Could you please fix this? We really want to speed up the inference stage. (With LORA model). Thanks!

XvHaidong avatar Dec 08 '23 14:12 XvHaidong