codellama icon indicating copy to clipboard operation
codellama copied to clipboard

instruct model's performance become poor when switching to different format

Open for-just-we opened this issue 1 year ago • 6 comments

I use 7b and 13b-instruct model to do program analysis task. When I use original format 7b-Instruct and 13b-Instruct model, following example_chat_completion.py, everything is all right.

But when I use huggingface version model, use the generate api. I found the model's response is much worse than origin format one. Also, the same thing happen when deploy online with tool like lmdeploy.

It seems only locally run original format model give me a satisfiable answer. How to fix this?

for-just-we avatar Oct 19 '23 13:10 for-just-we

In the Huggingface generation pipeline, Are you using Instruct prompt instructions ?

<s>[INST] user message 1 [/INST] response 1 </s><s>[INST] user message 2 [/INST] response 2 </s>

In example_chat_completion.py, this thing is handled internally while in huggingface we have to do manually.

humza-sami avatar Nov 30 '23 20:11 humza-sami

In the Huggingface generation pipeline, Are you using Instruct prompt instructions ?

<s>[INST] user message 1 [/INST] response 1 </s><s>[INST] user message 2 [/INST] response 2 </s>

In example_chat_completion.py, this thing is handled internally while in huggingface we have to do manually.

yes, I actually also tried different prompt method. For example, I tried follow example-text-completion.py, while the response may not be as good as follow chat_completion, the answer is sort of reliable. But for huggingface model, the answer is always worse.

for-just-we avatar Dec 11 '23 03:12 for-just-we

Can you share code for inference ?

humza-sami avatar Dec 11 '23 06:12 humza-sami

inference

The prompt is like:

Pay special attention to semantic similarity between the xx,xx,xx 

...

...

Your analysis should determine if there is a substantial possibility of the indirect call effectively invoking the function. 
Provide your answer with only 'yes' (for likely), or 'no' (for unlikely).

The code for inference is the same as example_chat.py for llama format. For huggingface format. I transform the prompt into str and use the same code in huggingface example

for-just-we avatar Dec 23 '23 07:12 for-just-we

@for-just-we can you post some code for inference with a relevant example prompt for both this repo and the HF model so we can see where things might go wrong?

jgehring avatar Dec 23 '23 10:12 jgehring

@for-just-we It would be helpful if you post your inference code here. Anyways, Could you try this code snippet and check if it is producing some better results.

from transformers import AutoTokenizer
import transformers
import torch

model = "YOUR MODEL NAME"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)


prompt = """<s>[INST] Pay special attention to semantic similarity between the xx,xx,xx\n\nYour analysis should determine if there is a substantial possibility of the indirect call effectively invoking the function. 
Provide your answer with only 'yes' (for likely), or 'no' (for unlikely). [/INST]"""

sequences = pipeline(
    prompt,
    do_sample=True,
    top_k=10,
    temperature=0.1,
    top_p=0.95,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

humza-sami avatar Dec 24 '23 08:12 humza-sami