llama-cpp-agent Fix GEMMA_2, LLAMA_3, and PHI

Gemma needs another \n in the prompt template to avoid slow processing of follow up prompts, see #54

Screenshot from 2024-07-07 19-16-28 As can be seen in the attached screenshot the prompt processing is much faster. Before: The new user prompt plus the models previous answer are processed (42 tokens) After: Only the new user prompt is processed (18 tokens)

The longer the models previous answer the more relevant...

Probably there are similar issues in all other prompt templates.

Jul 07 '24 17:07 woheller69

Great patch I would check all to see if anyone is missing a typo or newline.

Thank u

Jul 07 '24 21:07 pabl-o-ce

it seems that keep missing some newlines in there you wanna add in llama3 in this PR? It would be like this the correct one:

llama_3_prompt_markers = {
    Roles.system: PromptMarkers("""<|start_header_id|>system<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.user: PromptMarkers("""<|start_header_id|>user<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.assistant: PromptMarkers("""<|start_header_id|>assistant<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.tool: PromptMarkers("""<|start_header_id|>function_calling_results<|end_header_id|>\n""", """<|eot_id|>"""),
}

I don't know if in tools would be another \n newline.. I'm not sure

~~On Gemma I'm not sure is required a new line I have to test it~~

And on Phi you are correct :)

Jul 07 '24 21:07 pabl-o-ce

I saw the issue you put it in Gemma Thanks.

Just if wanna add the missing Llama 3 would be awesome

Thank u for the contribution

Jul 07 '24 22:07 pabl-o-ce

I don't use tools, so I don't know if another \n is missing there. But Llama3 is also fixed in this PR.

But I think a better long term fix in general would be my proposal in #54, i.e. to keep the leading and trailing newlines (and spaces?) produced by the model and store them in history. I don't know the code (and I am a python noob) but it feels like they are being stipped.

Jul 08 '24 04:07 woheller69

it seems that keep missing some newlines in there you wanna add in llama3 in this PR? It would be like this the correct one:

llama_3_prompt_markers = {
    Roles.system: PromptMarkers("""<|start_header_id|>system<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.user: PromptMarkers("""<|start_header_id|>user<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.assistant: PromptMarkers("""<|start_header_id|>assistant<|end_header_id|>\n\n""", """<|eot_id|>"""),
    Roles.tool: PromptMarkers("""<|start_header_id|>function_calling_results<|end_header_id|>\n""", """<|eot_id|>"""),
}

We only need one more \n in Roles.assistant, not in Roles.user (there it should not matter) We only need to make sure that we append the latest model answer exactly as the model produced it so that it finds the tokens already processed in memory.

Jul 08 '24 04:07 woheller69

Is there any reason, not to merge this?

Jul 25 '24 14:07 woheller69

@woheller69 Sorry, was busy with other stuff, will merge this after checking it in the next days.

Aug 01 '24 00:08 Maximilian-Winter

Mistral Small 22B needs an additional blank for faster multi-turn conversation: Roles.assistant: PromptMarkers(""" """, """""") Unfortunately other Mistral based models like Mistral-Nemo do not need it...

Sep 18 '24 17:09 woheller69

what is the reason for stripping the prompt? For Mistral Small it can also be fixed like this if the assistant reply is not stripped. I think this would alter the history the llm knows:

def _format_message_content(self, content: str, role: Roles) -> str:
    if self.strip_prompt and role != Roles.assistant:
        return content.strip()
    else:
        return content

Can that be the root cause?

Oct 03 '24 16:10 woheller69

def _format_response(
        self,
        formatted_messages: str,
        last_role: Roles,
        response_role: Literal[Roles.user, Roles.assistant] | None = None,
) -> Tuple[str, Roles]:
    if response_role is None:
        response_role = Roles.assistant if last_role != Roles.assistant else Roles.user

    prompt_start = self.prompt_markers[response_role].start.strip() if self.strip_prompt else self.prompt_markers[
        response_role].start
    return formatted_messages + prompt_start, response_role

Also _format_response contains strip() and seems to be part of the problem.

If I set strip_prompt = False Llama3, Mixtral, and Gemma work fine without the modifications I proposed above. So essentially: what is the advantage of strip_prompt = True which is set by default?

Oct 03 '24 17:10 woheller69

Is this project still maintained?

Dec 01 '24 15:12 woheller69

Hello @woheller69 I'm here to help / maintain the repo

Dec 10 '24 17:12 pabl-o-ce

do you know the reason for "So essentially: what is the advantage of strip_prompt = True which is set by default?" This seems to be the root cause.

Dec 10 '24 17:12 woheller69

No ideas... but I would ask @Maximilian-Winter whenever he have time. Also @woheller69 if you have discord we can talk there I'm looking forward to have people help us in the repo

Dec 10 '24 19:12 pabl-o-ce

Fix GEMMA_2, LLAMA_3, and PHI_3 MessagesFormatter