Add support for reasoning models

Open RobinPicard opened this issue 4 months ago • 0 comments

Expose 2 new keywords for generation:

end_thinking_tag: a string indicating the tag used by the reasoning model to indicate that thinking is finished (and so that we should start constraining the generation)
thinking_max_tokens: an int giving the maximum number of tokens during which the model can think, after that number is reached, we force the generation of the end of thinking token

Not supported:

Models for which the end of the thinking does not correspond to a single token

If we want to capture the content of the thinking in the future when we will return an object with various attributes instead of just the text output, we could add an argument start_thinking_tag for the models that use one.

Aug 06 '25 11:08 RobinPicard