I want to use the newest Llama 3 model for the RAG but since the llama prompt is different from mistral and other prompt, it doesnt stop producing results when using the Local method, I'm aware that ollama has it fixed but its kinda slow to my liking than the local methon>

I made a dirty fix template to the prompt_helper.py by following the meta docs on prompting https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

#prompt_helper.py Code class Llama3PromptStyle(AbstractPromptStyle): def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str: prompt = "<|begin_of_text|>" for message in messages: role = message.role content = message.content or "" prompt += f"<|start_header_id|>{role.lower()}<|end_header_id|>" prompt += f"{content.strip()}<|eot_id|>" prompt += "<|end_of_text|>" return prompt

def _completion_to_prompt(self, completion: str) -> str:
    return self._messages_to_prompt(
        [ChatMessage(content=completion, role=MessageRole.USER)]
    )

def get_prompt_style( prompt_style: Literal["default", "llama2","llama3", "tag", "mistral", "chatml"] | None) -> AbstractPromptStyle: """Get the prompt style to use from the given string.

:param prompt_style: The prompt style to use.
:return: The prompt style to use.
"""
if prompt_style is None or prompt_style == "default":
    return DefaultPromptStyle()
elif prompt_style == "llama2":
    return Llama2PromptStyle()
elif prompt_style == "llama3":  //Here's the changes, in theory it should trigger
    return Llama3PromptStyle()
elif prompt_style == "tag":
    return TagPromptStyle()
elif prompt_style == "mistral":
    return MistralPromptStyle()
elif prompt_style == "chatml":
    return ChatMLPromptStyle()

raise ValueError(f"Unknown prompt_style='{prompt_style}'")

Console Error

llamacpp.prompt_style Input should be 'default', 'llama2', 'tag', 'mistral' or 'chatml' [type=literal_error, input_value='llama3', input_type=str]

Regardless of it, I dont understand why it still produces an error. Sorry I'm new to these stuff. I hope privategpt uses a template system similar to other LLM wrapper for added flexibility

Thanks in advance privategpt team

Apr 26 '24 12:04 psychopatz

@psychopatz : you might want to patch private_gpt/settings/settings.py:117 to add your prompt settings "llama3". Then update your settings.yaml, section prompt_style accordingly.

Apr 29 '24 15:04 neofob

@psychopatz : you might want to patch private_gpt/settings/settings.py:117 to add your prompt settings "llama3". Then update your settings.yaml, section prompt_style accordingly.

Thanks, I implemented the patch already, the problem of my slow ingestion is because of ollama's default big embed and my slow laptop lol so I just use a smaller one, thanks for the help regardless, I'll just keep on using ollama for now

May 06 '24 19:05 psychopatz

It would be great if someone could post how exactly settings.yaml section prompt_style needs to look for llama3!

May 17 '24 00:05 rkilchmn

It would be great if someone could post how exactly settings.yaml section prompt_style needs to look for llama3!

How to set up PrivateGPT to use Meta Llama 3 Instruct model?

Here's an example prompt styles using instructions Large Language Models (LLM) for Question Answering (QA) the issue #1889 but you change the prompt style depending on the languages and LLM models.

Download a quantized instructions model of the Meta Llama 3 file into the models folder.

You can download Meta Llama 3 model from Hugging Face 🤗 repository, for example using the below command:

Hugging Face repository contains Llama-3-Chinese-8B-Instruct model files.

$ huggingface-cli download hfl/llama-3-chinese-8b-instruct-gguf ggml-model-f16.gguf --local-dir . --local-dir-use-symlinks False

Edit the prompt style in Python files to generate the prompt format.

prompt_helper.py

class Llama3PromptStyle(AbstractPromptStyle):

    def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
        prompt = f"<|begin_of_text|>"
        for message in messages:
            role = message.role
            content = message.content or ""
            if role.lower() == "user":
                prompt += f"<|start_header_id|>{role.lower()}<|end_header_id|>"
                prompt += f"{content.strip()}<|eot_id|>"
        return prompt

    def _completion_to_prompt(self, completion: str) -> str:
        system_prompt_str = ""

        return (
            f"<|begin_of_text|> <|start_header_id|> {system_prompt_str.strip()} <|end_header_id|> "
            f"{completion.strip()} <|end_of_text|>"
        )

def get_prompt_style(
    prompt_style: Literal["default", "llama3"] | None
) -> AbstractPromptStyle:
    if prompt_style == "llama3":
        return Llama3PromptStyle()
    raise ValueError(f"Unknown prompt_style='{prompt_style}'")

Add a list of tokens to process stop token in the model generation.

llm_component.py

class LLMComponent:
        #...
        match settings.llm.mode:
            case "llamacpp":
                # ...
                prompt_style = get_prompt_style(settings.llamacpp.prompt_style)
                settings_kwargs = {
                    "n_gpu_layers": 2,
                    "n_threads": 2,
                    "n_ctx": 4096,
                    "n_batch": 480,
                    "stop": ["<|eot_id|>", "<|end_of_text|>", "<|end_header_id|>"],
                }

You can read documents about Meta Llama 3 Instruct prompt configurations

The QA completion requires that the LLM model knows how to format the messages into a prompt.

Notes

A prompt can contain a system message, or multiple user and assistant messages,

but always ends with the last user message followed by the assistant header.

The <|begin_of_text|> token is equivalent to the BOS token.

The <|end_of_text|> is equivalent to the EOS token.

The model in LLAMA-3 expects the assistant header at the end of the prompt.

<|start_header_id|>assistant<|end_header_id|>

_The message ends with the assistant header, to prompt the model to start generation.

You can configure settings files of llms-llama-cpp option in PrivateGPT in settings files.

settings-local.yaml

llm:
  mode: llamacpp
  max_new_tokens: 512
  context_window: 3900
  tokenizer: hfl/llama-3-chinese-8b-instruct
  
llamacpp:
  prompt_style: "llama3"
  llm_hf_repo_id: hfl/llama-3-chinese-8b-instruct-gguf
  llm_hf_model_file: ggml-model-f16.gguf

embedding:
  mode: huggingface

huggingface:
  embedding_hf_model_name: BAAI/bge-small-zh-v1.5

Create the virtual environment to contain a specific Python interpreter.

$ virtualenv venv --python=python3.11

Use Poetry to run Python script to download tokenizer files and the embeddings model to the models folder.

$ poetry run python scripts/setup

Install the required dependencies with poetry command to run PrivateGPT.

$ poetry install --extras "ui llms-llama-cpp llms-ollama embeddings-ollama embeddings-huggingface vector-stores-qdrant"

Install the Python bindings llama-cpp-python for llama.cpp

$ CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 poetry run pip install --force-reinstall --no-cache-dir "llama-cpp-python"

Type the following command to run local PrivateGPT to ask questions about your documents.

$ PGPT_PROFILES=local make run

You may use llama2 prompt style and replace the source code with Meta Llama 3 tokens, for example you can edit prompt style in prompt_helper.py file.

class Llama3PromptStyle(AbstractPromptStyle):
    def _messages_to_prompt(self, messages: Sequence[ChatMessage]) -> str:
        string_messages: list[str] = []
        if messages[0].role == MessageRole.SYSTEM:
            system_message_str = messages[0].content or ""
            messages = messages[1:]
        else:
            system_message_str = ""

        system_message_str = f"<|start_header_id|> {system_message_str.strip()} <|end_header_id|>"

        for i in range(0, len(messages), 2):
            user_message = messages[i]
            assert user_message.role == MessageRole.USER

            if i == 0:
                str_message = f"<|begin_of_text|> {system_message_str} "
            else:
                string_messages[-1] += f"<|end_of_text|> "
                str_message = f"<|begin_of_text|> "

            str_message += f"{user_message.content} <|eot_id|> "

            if len(messages) > (i + 1):
                assistant_message = messages[i + 1]
                assert assistant_message.role == MessageRole.ASSISTANT
                str_message += f" {assistant_message.content}"

            string_messages.append(str_message)

        return "".join(string_messages)

    def _completion_to_prompt(self, completion: str) -> str:
        system_prompt_str = ""

        return (
            f"<|begin_of_text|> <|start_header_id|> {system_prompt_str.strip()} <|end_header_id|> "
            f"{completion.strip()} <|eot_id|> "
        )
   
def get_prompt_style(
    prompt_style: Literal["default", "llama3"] | None
) -> AbstractPromptStyle:
    if prompt_style is None or prompt_style == "default":
        return DefaultPromptStyle()
    elif prompt_style == "llama3":
        return Llama3PromptStyle()
    raise ValueError(f"Unknown prompt_style='{prompt_style}'")

May 20 '24 01:05 CaptainNeo2023

It may be necessary to change the tokenizer according to Proper tokenizer.model is absent

Jun 13 '24 12:06 AlexPerkin

Please add Llama3 Prompt support thanks

Console Error