outlines Some inferences take forever to complete

Issue description

The issue was raised by other people on Discord too.

To quote one of them:

I'm running the same query 10 times (with equivalent prompts and output sizes), but some inferences are taking abnormally longer than others.

their screenshot

Repro

I made a reproduction code snippet that can run in Google Colab (w/ free T4 GPU):

💻 Code snippet

pip install outlines==0.0.13 transformers datasets optimum auto-gptq accelerate

from outlines import models
from outlines.text.generate import json, continuation
from json import dumps
from time import perf_counter
import torch


prompt = """<|system|>
You are a friendly AI assistant.
You're specialized in mathematics and open source Github repositories.
Your answers must be concise and factual.</s>
<|user|>
Write a very long poem</s>
<|assistant|>
"""
output_format = {
    "type": "object",
    "properties": {
        "poem": {"type": "string"}
    }
}
model = models.transformers("TheBloke/zephyr-7B-beta-GPTQ", device="cuda")

rng = torch.Generator(device="cuda")
rng.manual_seed(789001)

errors = []
for i in range(20):
  start_time = perf_counter()
  try:
    sequence = json(model, dumps(output_format))(prompt, rng=rng)
    poem = sequence.get('poem')
    elapsed_time = round(perf_counter() - start_time)
    n_characters_per_second = len(poem) // elapsed_time
    print(f"{i}\t{elapsed_time}\t{n_characters_per_second}\t{poem[:30]}..")
  except Exception as e:
    errors.append(e)
    print(f"{i}\t{elapsed_time}\tINFERENCE FAILED")

📃 Output

0	14	76	In the vastness of cosmic spac..
1	14	INFERENCE FAILED
2	769	0	In this universe, a vast expan..
3	389	0	In ancient lands, where skies ..
4	16	67	In the depths of the cosmos, w..
5	35	70	In the stillness of the mornin..
6	32	60	In a universe vast and unceasi..
7	13	77	75000 lines of blank verse, hi..
8	22	69	In a land of purest light, Who..
9	34	59	A cosmic dance of stars, a sym..
10	49	68	In the land of the digit, wher..
11	34	78	In a world vast and unknown,  ..
12	43	68	There was a time when words we..
13	54	70	In a world where chaos reigns..
14	12	62	Let the words unfurl like the ..
15	330	0	Infinity beckons from the far ..
16	31	60	In the depths of the universe,..
17	137	0	In this vast expanse of time a..
18	32	81	in this universe vast and unfa..

💥 Exceptions raised

import traceback

for error in errors:
    try:
        raise error
    except Exception as e:
        traceback.print_exc()


Traceback (most recent call last):
  File "<ipython-input-6-d8471672a411>", line 5, in <cell line: 3>
    raise error
  File "<ipython-input-5-1a425bb0404a>", line 8, in <cell line: 5>
    sequence = json(model, dumps(output_format))(prompt, rng=rng)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/sequence.py", line 240, in __call__
    result = self.postprocess_completions(result)
  File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/regex.py", line 226, in postprocess_completions
    return [self.format_fn(result) for result in results]
  File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/regex.py", line 226, in <listcomp>
    return [self.format_fn(result) for result in results]
  File "/usr/local/lib/python3.10/dist-packages/outlines/text/generate/regex.py", line 397, in <lambda>
    format_fn = lambda x: pyjson.loads(x)
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid \escape: line 2 column 570 (char 571)

Results

✅ 14 inferences succeeded fast
⏰ 5 inferences succeeded but were extremely slow (indices: 2, 3, 15, 17, 19)
💥 1 inference failed fast (index: 1)

Outlines/Python version information:

Outlines 0.0.13
Python 3.10.12

Dec 19 '23 20:12 gaspard-dv

Thank you so much for the detailed report! Will come back to you shortly.

Dec 19 '23 20:12 rlouf

These timing results contain significant non-inference setup steps (e.g. json(model, dumps(output_format))).

Dec 20 '23 21:12 brandonwillard

Yes indeed! json(model, dumps(output_format)) takes a few seconds to complete and shouldn't be in the for-loop. But this is not the step that gets "stuck".

Dec 21 '23 12:12 gaspard-dv

It would still be nice to have results without having it in the loop, and use cProfile to understand which step "gets stuck". To get to similar experimental conditions I would also use the maxLength field constraint.

Jan 07 '24 08:01 rlouf

Please try

class OutputModel(BaseModel):
    poem: str

And pass OutputModel instead output_format. This schema ensures 'required': ['poem'] attr is included and you don't have any generations missing the poem key.

Additionally, you will need to set whitespace_pattern as explained here https://github.com/outlines-dev/outlines/issues/690#issuecomment-2102291934

json(model, dumps(output_format), whitespace_pattern=r"[ ]?")...

With these changes your script works for me and doesn't have any slow or failed inference.

May 09 '24 10:05 lapp0