llama3 LLama3 starts talking to it self

i'm using the model on my local and i'm trying to cerate a chatbot with it but when i send the user message to the model the model start talking to it self and wont stop how can i fix this issue ?

Apr 19 '24 15:04 Arian-Akbari

Hi @Arian-Akbari! Probably related to the fact that llama3 uses two stop ids in conversational mode: https://github.com/meta-llama/llama3/blob/0cee08ec68f4cfc0c89fe4a9366d82679aaa2a66/llama/tokenizer.py#L91-L94

Your inference software should stop generation once any of them are encountered :)

Apr 19 '24 15:04 pcuenca

@Arian-Akbari Thanks for the note and for building with Llama 3 so fast!

Please double check that you are accounting for the stop tokens as mentioned by @pcuenca above. If you are not using these special tokens, then the model may ramble. If you are still running into issues after testing this then please let us know and include some examples for us to repro.

Apr 19 '24 16:04 ejsd1989

wow thanks guys for the help you see this is my code I'm using @modelfusion/vercel-ai and vercel.ai to build a chatbot with llama3 here is my code :

import { ModelFusionTextStream, asChatMessages } from "@modelfusion/vercel-ai";
import { StreamingTextResponse } from "ai";
import { ollama, streamText } from "modelfusion";

export async function POST(request) {
  const { messages } = await request.json();
  console.log(messages);
  const textStream = await streamText({
    model: ollama.ChatTextGenerator({ model: "llama3" }).withChatPrompt(),
    prompt: {
      system:
        "You are an AI chat bot. " +
        "Follow the user's instructions carefully.",

      messages: asChatMessages(messages),
    },
  });
  return new StreamingTextResponse(
    ModelFusionTextStream(
      textStream,
      // optional callbacks:
      {
        onToken(token) {
          if (token === "<|eot_id|>" || token === "<|end_of_text|>") {
            return;
          }
          console.log(token);
        },
      }
    )
  );
}

now I'm trying to some how stop the streaming response if the token where the ones you mentioned @pcuenca

but it seems i cant find a way for that can you guys help me with this .

Apr 20 '24 00:04 Arian-Akbari

@Arian-Akbari could you post some example output? I'm not too familiar with modelfusion, but taking a look at the repo.

One thing that may also help is taking a look at our examples on llama-recipes.

In particular, this section may be useful:

A multiturn-conversation with Llama 3 follows this prompt template:

_<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>_

Each message gets trailed by an <|eot_id|> token before a new header is started, signaling a role change.

Apr 20 '24 07:04 ejsd1989

Closing for now based on last reply. Please feel free to create a new issue if it's still causing you issues!

Apr 24 '24 16:04 ejsd1989

@Arian-Akbari could you post some example output? I'm not too familiar with modelfusion, but taking a look at the repo.

One thing that may also help is taking a look at our examples on llama-recipes.

In particular, this section may be useful:

A multiturn-conversation with Llama 3 follows this prompt template: <|begin_of_text|><|start_header_id|>system<|end_header_id|> {{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|> {{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|> {{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|> {{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|> Each message gets trailed by an <|eot_id|> token before a new header is started, signaling a role change.

in the multi-round chat setting like this one mentioned above, i was wondering in order to do sft, suppose i have the input_ids as input, target_ids as desired output. My question is do i need to mask all the previous assistant answer in the target_ids? or only masking the last assistant's answer and keep all previous assistant's answers unmasked is enough?

May 15 '24 11:05 disperaller

llama3 llama3 copied to clipboard

LLama3 starts talking to it self

llama3
llama3 copied to clipboard