llama-recipes Code Completion and Code Infilling

Hello,

I am trying to run inference for Code LLama using HuggingFace Transformers Accelerate Model for llama 2 7B. I am able to run inference with the examples/inference.py script. However, when I try to run the example command for code completion and code infilling, I get the following error:

TypeError: llama_forward() got an unexpected keyword argument 'padding_mask'

Warning: The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.

Command: python examples/code_llama/code_infilling_example.py --model_name ../hf_transformers/7B/ --prompt_file examples/code_llama/code_infilling_prompt.txt --temperature 0.2 --top_p 0.9

Versions: torch==2.0.1 transformers==4.34.1 tokenizers==0.14.1 optimum==1.13.2

Oct 19 '23 17:10 akaneshiro7

I ran into this issue too. It's supposed to be fixed in the latest update of Transformers: https://github.com/huggingface/optimum/issues/1446

Oct 21 '23 22:10 jxmorris12

thanks @jxmorris12 , @akaneshiro7 seems the fix in optimum should help. Pls let me us know if that works for you.

Oct 22 '23 19:10 HamidShojanazeri

Can confirm it works after installing from source.

Oct 23 '23 17:10 eduardosanchezg

@HamidShojanazeri

Yes, installing from source seems to have fixed the issue with running Code Completion and Code Infilling with the HuggingFace Accelerate Model. Am I able to run it with a PEFT Fine-Tuned Model? When I try to run the same command but passing in the peft_model, I get the following error:

File "/home/ubuntu/anaconda3/envs/vllm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1685, in delattr super().delattr(name) AttributeError: _hf_hook

Oct 23 '23 18:10 akaneshiro7

I meet the issue "AttributeError: _hf_hook" too.

Oct 24 '23 06:10 huangyangyu

It went away when I set 'use_fast_kernels=False'

I meet the issue "AttributeError: _hf_hook" too.

Nov 29 '23 00:11 callumHub

I meet the issue "AttributeError: _hf_hook" too. Could you please fix this? We really want to speed up the inference stage. (With LORA model). Thanks!

Dec 08 '23 14:12 XvHaidong

llama-recipes
llama-recipes copied to clipboard

Code Completion and Code Infilling - LLama 7B Inference

llama-recipes llama-recipes copied to clipboard

Code Completion and Code Infilling - LLama 7B Inference

llama-recipes
llama-recipes copied to clipboard