Documenting the speed gain/loss
What behavior of the library made you think about the improvement?
Mainly the documentation.
How would you like it to behave?
I am trying to run a selection/generation mixture using outlines, that I also tried running with guidance and lmql. No one seem to actually have timing comparison and it is quite important for my case. I am getting super slow results with outlines and want to make sure if it is the expected outcome.
| Vanilla vLLM (No Token Masking) | Outlines |
|---|---|
| 00:05:27 | 03:24:32 |
System: Google Colab / A100 GPU Model: Mistral-7b-instruct-v2
My task is something like:
def reason_and_select(text_generator, choice_generator, prompt, max_choices=4):
reasoning = text_generator(prompt, max_tokens=150, stop_at="</reasoning>")
new_prompt = prompt + reasoning + "</reasoning>" + pre_response_part_2
selections = []
for _ in range(max_choices):
option = choice_generator(new_prompt)
new_prompt += option
if option == "]":
break
elif option == ", ":
continue
selections.append(option)
return selections, reasoning, new_prompt
for _ in range(1000):
reason_and_select(text_generator, choice_generator, some_prompt)
Can you provide a complete minimal working example of the outlines code you're using to get that result?
In general, don't call generator constructors like outlines.generate.choice in a loop. Only construct them once and then reuse them; otherwise, you're unnecessarily paying to construct them over and over again, and then you're not measuring a run-time/sampling cost, you're only measuring a setup cost.
Once they're constructed, those generators should have almost no run-time cost compared to their unconstrained counterparts.
Yes I also wanted to keep the generators from repeating, but the docs are not very clear on how to actually use generators. I see on two separate occasions where choice is used differently, but I think the docs are outdated as running choices this way doesn't seem to be working:
import outlines.models as models
complete = models.openai("gpt-3.5-turbo")
answer = complete(
"Pick the odd word out: skirt, dress, pen, jacket",
is_in=["skirt", "dress", "pen", "jacket"]
)
from https://outlines-dev.github.io/outlines/reference/choices/
Everywhere else uses choices as a parameter when creating the generator. In my case the choices change each run so I thought this would be the natural way of running, but I see that it makes no sense like this lol 😄. Do you guys still have an exposed parameter to re-define choices?
Everywhere else uses choices as a parameter when creating the generator. In my case the choices change each run so I thought this would be the natural way of running, but I see that it makes no sense like this lol 😄. Do you guys still have an exposed parameter to re-define choices?
Yes I will work on the docs over the next few days to make things clearer, sorry for the confusion.
You will need to re-define a new generator every time the choices change. However, after the first run this should compile in almost no time. I'm also soon going to push a change to dramatically reduce the compile time for the first run. Thank you for your patience.