outlines icon indicating copy to clipboard operation
outlines copied to clipboard

Infinite repetitions and invalid JSON - Outlines with MLX

Open ea167 opened this issue 5 months ago • 1 comments

Describe the issue as clearly as possible:

On certain prompts, the LLM can spiral into an infinite loop providing the same item repeatedly, until stopped by max_tokens parameter.

In that case, the JSON will fail with an exception as being invalid, without returning any result.

Llama.cpp and MLX-LM have parameters to penalize repetition and thus preventing it. While Outlines accept additional parameters to pass to Llama.cpp, it does not for MLX-LM, resulting in prompt failure.

long_42k_llm_prompt.md

Steps/code to reproduce the bug:

RESULTS_JSON_SCHEMA = """{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
 "results": {
  "type": "array",
  "items": {
   "type": "string"
  }
 }
},
"required": ["results"],
"additionalProperties": false
}"""
 
 
from outlines import models, generate, samplers
import json
 
model = models.mlxlm("mlx-community/Meta-Llama-3.1-8B-Instruct-4bit")
sampler = samplers.multinomial( top_p=0.1 )
generator = generate.json( model, RESULTS_JSON_SCHEMA, sampler )
 
json_answer = generator( long_42k_llm_prompt, max_tokens=1000 )
print( json.dumps( json_answer, indent=4 ) )

Expected result:

List without endless repetition at the end.

When running directly MLX-LM, we get an infinite loop, stopped by max_tokens only

python -m mlx_lm.generate --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --prompt "$(< ~/Downloads/long_42k_llm_prompt.md)" --max-tokens 5000

...
687. **Methodist Hospital**
688. **Methodist Hospital**
689. **Methodist Hospital**
690. **Methodist Hospital**
691. **Methodist Hospital**
692. **Methodist Hospital**
693. **Methodist Hospital**
694. **Methodist Hospital**
695. **Methodist Hospital**
696. **Methodist Hospital**
697. **Methodist Hospital**

==========
Prompt: 11380 tokens, 432.382 tokens-per-sec
Generation: 5000 tokens, 26.872 tokens-per-sec
Peak memory: 6.891 GB

Error message:

No response

Outlines/Python version information:

Version information

0.0.47.dev69+g72377db
Python 3.12.4
mlx==0.17.2
mlx-lm==0.18.1

Context for the issue:

No response no

ea167 avatar Sep 05 '24 21:09 ea167