outlines Auto-apply chat template in `SequenceGenerator` and `SequenceGeneratorAdapter`, if available

This PR auto-applies chat templates by default when using instruct/chat models. Doesn't support LlamaCPP for now tho.

Why?

Instruct/Chat models tend to be annoyingly template dependent (i.e. they perform worse if the prompts don't follow the chat template used in the finetuning step). And the more and longer they are finetuned, the worse the problem gets. Hence this PR.

Also see this issue https://github.com/outlines-dev/outlines/issues/987 raised by @lapp0

Interface changes

This PR has minimal impact on the interface. It just changes the default behavior of the generation step.

However, this feature can be disabled either on the creation of the generator:

generator = outlines.generate.choice(model, ["Positive", "Negative"], apply_chat_template=False)

Or when calling the generator

answer = generator(prompt, apply_chat_template=False)

Jul 05 '24 16:07 leloykun

I don't think this design is coherent with the rest of the library; we want to avoid kwargs as much as we possibly can. Here I would simply add a apply_chat_template to the model instance.

Jul 05 '24 20:07 rlouf

@lapp0 @rlouf

How do you guys think should the interface look like?

Here, I mirrored Huggingface's Pipeline interface where we can specify configs/args in the constructor and (optionally) override them in the model call. I like it cuz it's more flexible. But yah, it does make things a bit more complicated and less pythonic.

I did a quick look at other libraries, and it seems that they either (1) don't auto-apply the chat templates at all or (2) have a separate code path for base & instruct/chat models (e.g. TGI & MixEval). I think there are two reasons why:

It's hard to know which models are base models and which are instruct/chat models. I thought checking whether chat_template is None or not would suffice. But some chat models apparently just leave them out (especially the older ones & third-party finetunes). Additionally, transformers' PreTrainedTokenizerBase base class has a default_chat_template property--so, if I'm not mistaken, we can run tokenizer.apply_chat_template on all tokenizers without erroring out. And
Some models don't support system prompts and it's hard to know which ones do and which ones don't.

So yah, for now, we need a way to somehow let the library know that whether we're dealing with a base model or an instruct/chat model. Worst case is we might also need to ask the user to specify the system prompt. But if we're gonna force them to go all that trouble anyway, we might as well not do this by default.

I think a good compromise is to (1) not apply the chat templates by default but (2) warn the user if the chat template is specified in the tokenizer config but is not being used.

Jul 07 '24 16:07 leloykun

answer = generator(
   model.apply_chat_template(prompt)
)

It would substantially simplify the code in this PR as well.

Jul 07 '24 19:07 rlouf

I like this pull request that adds the possibility to use the tokenizer's apply_chat_template, but I wonder if it's a good idea to make it the default behavior. I have had very bad experiences with apply_chat_template where it add spaces or remove spaces when it shouldn't or even worse in function calling cases where it completely ignores the functions given without even raising errors (see this example). Many people might complain about something which is outside the control of Outlines.

Jul 13 '24 12:07 alonsosilvaallende

Indeed, we are not going to make it the default behavior. Users should be able to inspect the modified prompt before they pass it to the generator.

Jul 13 '24 13:07 rlouf

@leloykun are you still working on this?

Jul 18 '24 13:07 rlouf

@rlouf yup! I just deprioritized this in favor of other stuff; I'll get back to this soon

btw, thanks for the feedback, @alonsosilvaallende!

Jul 19 '24 13:07 leloykun

So can this feature still be added? A chat template is really needed in many applications.

Jul 27 '24 07:07 Bit0r

So can this feature still be added? A chat template is really needed in many applications.

Yes, if you'd like to introduce it in a PR, I'll review and support its inclusion provided it follows the behavior discussed in this thread! :)

Jul 27 '24 16:07 lapp0

Hi @leloykun! Outlines has changed a bit since you opened this PR as we released our v1. This topic is however still very relevant and we recently created an issue about it #1629.

In the v1 of Outlines, models now each have an associated ModelTypeAdapter that is responsible for validating and modifying the user input to what the model needs. I think it would make sense to put this instruct chat templating there. So the user would provide the templating input (a dict) instead of a string when calling the model and we would recognize the format and apply the template. It's just preliminary idea though, maybe there are issues with or better ways of handling it.

Jun 19 '25 14:06 RobinPicard

Closing the PR as Outlines now has a Chat model input object that serves this purpose.

Aug 01 '25 14:08 RobinPicard

outlines outlines copied to clipboard

Auto-apply chat template in `SequenceGenerator` and `SequenceGeneratorAdapter`, if available

Why?

Interface changes

outlines
outlines copied to clipboard