outlines
outlines copied to clipboard
Infinite repetitions and invalid JSON - Outlines with MLX
Describe the issue as clearly as possible:
On certain prompts, the LLM can spiral into an infinite loop providing the same item repeatedly, until stopped by max_tokens parameter.
In that case, the JSON will fail with an exception as being invalid, without returning any result.
Llama.cpp and MLX-LM have parameters to penalize repetition and thus preventing it. While Outlines accept additional parameters to pass to Llama.cpp, it does not for MLX-LM, resulting in prompt failure.
Steps/code to reproduce the bug:
RESULTS_JSON_SCHEMA = """{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"results": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": ["results"],
"additionalProperties": false
}"""
from outlines import models, generate, samplers
import json
model = models.mlxlm("mlx-community/Meta-Llama-3.1-8B-Instruct-4bit")
sampler = samplers.multinomial( top_p=0.1 )
generator = generate.json( model, RESULTS_JSON_SCHEMA, sampler )
json_answer = generator( long_42k_llm_prompt, max_tokens=1000 )
print( json.dumps( json_answer, indent=4 ) )
Expected result:
List without endless repetition at the end.
When running directly MLX-LM, we get an infinite loop, stopped by max_tokens only
python -m mlx_lm.generate --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --prompt "$(< ~/Downloads/long_42k_llm_prompt.md)" --max-tokens 5000
...
687. **Methodist Hospital**
688. **Methodist Hospital**
689. **Methodist Hospital**
690. **Methodist Hospital**
691. **Methodist Hospital**
692. **Methodist Hospital**
693. **Methodist Hospital**
694. **Methodist Hospital**
695. **Methodist Hospital**
696. **Methodist Hospital**
697. **Methodist Hospital**
==========
Prompt: 11380 tokens, 432.382 tokens-per-sec
Generation: 5000 tokens, 26.872 tokens-per-sec
Peak memory: 6.891 GB
Error message:
No response
Outlines/Python version information:
Version information
0.0.47.dev69+g72377db
Python 3.12.4
mlx==0.17.2
mlx-lm==0.18.1
Context for the issue:
No response no