codellama
codellama copied to clipboard
instruct model's performance become poor when switching to different format
I use 7b and 13b-instruct model to do program analysis task. When I use original format 7b-Instruct and 13b-Instruct model, following example_chat_completion.py
, everything is all right.
But when I use huggingface version model, use the generate
api. I found the model's response is much worse than origin format one. Also, the same thing happen when deploy online with tool like lmdeploy.
It seems only locally run original format model give me a satisfiable answer. How to fix this?
In the Huggingface generation pipeline, Are you using Instruct prompt instructions ?
<s>[INST] user message 1 [/INST] response 1 </s><s>[INST] user message 2 [/INST] response 2 </s>
In example_chat_completion.py, this thing is handled internally while in huggingface we have to do manually.
In the Huggingface generation pipeline, Are you using Instruct prompt instructions ?
<s>[INST] user message 1 [/INST] response 1 </s><s>[INST] user message 2 [/INST] response 2 </s>
In example_chat_completion.py, this thing is handled internally while in huggingface we have to do manually.
yes, I actually also tried different prompt method. For example, I tried follow example-text-completion.py, while the response may not be as good as follow chat_completion, the answer is sort of reliable. But for huggingface model, the answer is always worse.
Can you share code for inference ?
inference
The prompt is like:
Pay special attention to semantic similarity between the xx,xx,xx
...
...
Your analysis should determine if there is a substantial possibility of the indirect call effectively invoking the function.
Provide your answer with only 'yes' (for likely), or 'no' (for unlikely).
The code for inference is the same as example_chat.py for llama format. For huggingface format. I transform the prompt into str
and use the same code in huggingface example
@for-just-we can you post some code for inference with a relevant example prompt for both this repo and the HF model so we can see where things might go wrong?
@for-just-we It would be helpful if you post your inference code here. Anyways, Could you try this code snippet and check if it is producing some better results.
from transformers import AutoTokenizer
import transformers
import torch
model = "YOUR MODEL NAME"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
prompt = """<s>[INST] Pay special attention to semantic similarity between the xx,xx,xx\n\nYour analysis should determine if there is a substantial possibility of the indirect call effectively invoking the function.
Provide your answer with only 'yes' (for likely), or 'no' (for unlikely). [/INST]"""
sequences = pipeline(
prompt,
do_sample=True,
top_k=10,
temperature=0.1,
top_p=0.95,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
max_length=200,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")