mistral.rs
mistral.rs copied to clipboard
Garbled output on very long prompts
Describe the bug Models seam to produce garbled output on very long prompts.
If i use the following script:
import openai
from transformers import AutoTokenizer
if __name__ == "__main__":
client = openai.Client(api_key="foobar", base_url=MISTRAL)
tok = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
with open("prompt.txt", "r", encoding="UTF-8") as f:
content = f.read()
print(len(tok.encode(content)))
response = client.chat.completions.create(
model="llama3",
messages=[
{
"role": "user",
"content": content,
}
],
max_tokens=256,
temperature=0.0,
)
print(response.choices[0])
To send a 7368 token long prompt to a mistralrs
server i recieve the following output:
Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content='!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!', role='assistant', function_call=None, tool_calls=None))
Meaning , that the server just filled the rest of the context length with !
.
If i send the same prompt to an ollama
server i get the following result:
Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='One mile is approximately equal to 1.6093 kilometers.', role='assistant', function_call=None, tool_calls=None))
Which is the correct answer for the given prompt.
The prompt i used: prompt.txt
The server parameters:
--isq Q4K plain -m meta-llama/Meta-Llama-3-8B-Instruct -a llama
Latest commit
Release 0.1.9