outlines Documenting the speed gain/loss

What behavior of the library made you think about the improvement?

Mainly the documentation.

How would you like it to behave?

I am trying to run a selection/generation mixture using outlines, that I also tried running with guidance and lmql. No one seem to actually have timing comparison and it is quite important for my case. I am getting super slow results with outlines and want to make sure if it is the expected outcome.

Vanilla vLLM (No Token Masking)	Outlines
00:05:27	03:24:32

System: Google Colab / A100 GPU Model: Mistral-7b-instruct-v2

My task is something like:

def reason_and_select(text_generator, choice_generator, prompt, max_choices=4):
    reasoning = text_generator(prompt, max_tokens=150, stop_at="</reasoning>")
    new_prompt = prompt + reasoning + "</reasoning>" + pre_response_part_2
    selections = []
    for _ in range(max_choices):
        option = choice_generator(new_prompt)
        new_prompt += option
        if option == "]":
            break
        elif option == ", ":
            continue
        selections.append(option)

    return selections, reasoning, new_prompt

for _ in range(1000):
    reason_and_select(text_generator, choice_generator, some_prompt)

Mar 20 '24 22:03 BedirT

Can you provide a complete minimal working example of the outlines code you're using to get that result?

In general, don't call generator constructors like outlines.generate.choice in a loop. Only construct them once and then reuse them; otherwise, you're unnecessarily paying to construct them over and over again, and then you're not measuring a run-time/sampling cost, you're only measuring a setup cost.

Once they're constructed, those generators should have almost no run-time cost compared to their unconstrained counterparts.

Mar 21 '24 00:03 brandonwillard

Yes I also wanted to keep the generators from repeating, but the docs are not very clear on how to actually use generators. I see on two separate occasions where choice is used differently, but I think the docs are outdated as running choices this way doesn't seem to be working:

import outlines.models as models

complete = models.openai("gpt-3.5-turbo")
answer = complete(
    "Pick the odd word out: skirt, dress, pen, jacket",
    is_in=["skirt", "dress", "pen", "jacket"]
)

from https://outlines-dev.github.io/outlines/reference/choices/

Everywhere else uses choices as a parameter when creating the generator. In my case the choices change each run so I thought this would be the natural way of running, but I see that it makes no sense like this lol 😄. Do you guys still have an exposed parameter to re-define choices?

Mar 21 '24 03:03 BedirT

Everywhere else uses choices as a parameter when creating the generator. In my case the choices change each run so I thought this would be the natural way of running, but I see that it makes no sense like this lol 😄. Do you guys still have an exposed parameter to re-define choices?

Yes I will work on the docs over the next few days to make things clearer, sorry for the confusion.

You will need to re-define a new generator every time the choices change. However, after the first run this should compile in almost no time. I'm also soon going to push a change to dramatically reduce the compile time for the first run. Thank you for your patience.

Mar 21 '24 08:03 rlouf