mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Garbled output on very long prompts

Open LLukas22 opened this issue 9 months ago • 4 comments

Describe the bug Models seam to produce garbled output on very long prompts.

If i use the following script:

import openai
from transformers import AutoTokenizer

if __name__ == "__main__":
    client = openai.Client(api_key="foobar", base_url=MISTRAL)
    tok = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
    with open("prompt.txt", "r", encoding="UTF-8") as f:
        content = f.read()
        
    print(len(tok.encode(content)))
    
    response = client.chat.completions.create(
        model="llama3",
        messages=[
            {
                "role": "user",
                "content": content,
            }
        ],
        max_tokens=256,
        temperature=0.0,
    )
    print(response.choices[0])

To send a 7368 token long prompt to a mistralrs server i recieve the following output:

Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content='!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!', role='assistant', function_call=None, tool_calls=None))

Meaning , that the server just filled the rest of the context length with !.

If i send the same prompt to an ollama server i get the following result:

Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='One mile is approximately equal to 1.6093 kilometers.', role='assistant', function_call=None, tool_calls=None))

Which is the correct answer for the given prompt.

The prompt i used: prompt.txt

The server parameters: --isq Q4K plain -m meta-llama/Meta-Llama-3-8B-Instruct -a llama

Latest commit Release 0.1.9

LLukas22 avatar May 21 '24 09:05 LLukas22