guidance Last Token Sometimes Missing from Output in Guidance 0.2.0

The Bug When running a simple Guidance script with gen(), the last token generated by the LLM is sometimes missing. This behavior is inconsistent; sometimes, the full response appears, while at other times, the final token is lost.

To Reproduce

from guidance import models, gen, system, user, assistant

llm = models.OpenAI('gpt-3.5-turbo')

with system():
    llm += "You are an LLM. Please follow all instructions."

with user():
    llm += "Please reply to this message with FINISHED and nothing else."

with assistant():
    llm += gen(name='reply')

print(llm['reply'])

Expected Behavior The output should always be FINISHED.

Observed Behavior Sometimes, the output is truncated to FIN, and the last token (ISHED, which is a separate token) is missing. This also works for other examples, I just used this one because it should be relatively clear the model is very unlikely to just return FIN and that indeed something goes missing.

Additional Notes

This issue appears inconsistently: repeated runs of the same script may or may not trigger it.
It occurs with guidance==0.2.0.
The problem does not appear in guidance==0.1.16.
The problem occurs not only with gpt-3.5-turbo but also other GPT 4 variants I tried.

System info:

OS: Fedora
Guidance Version: 0.2.0

Feb 11 '25 16:02 floriangroetschla

@hudson-ai @mmoskal any ideas here?

Feb 13 '25 19:02 Harsha-Nori

I can reproduce this behavior with gpt-4o-mini too.

Mar 03 '25 14:03 ivanvmoreno

It has been a month since I opened this issue, and another user has also reported experiencing it. I imagine that this could impact a lot more users without them noticing. Please let me know if you need any more details to help investigate this. Thanks!

Mar 12 '25 11:03 floriangroetschla

Is it possible this issue is limited to OpenAI models? I cannot reproduce this using Phi 3.5.

from guidance import models, gen, system, user, assistant
from guidance.chat import Phi3MiniChatTemplate


if __name__ == "__main__":
    llm = models.Transformers(
        "microsoft/Phi-3.5-mini-instruct",
        chat_template=Phi3MiniChatTemplate,
    )
    with system():
        llm += "You are an LLM. Please follow all instructions."
    with user():
        llm += "Please reply to this message with FINISHED and nothing else."
    with assistant():
        llm += gen(name='reply')
    print(llm['reply'])

I ran this 10 times in a row on Guidance 3918b36c05f76215c9b061c5ee7398e975d26f78 and always got FINISHED (with the leading space) as a response.

May 08 '25 03:05 nchammas

I just ran some tests with 0.2.1 (from PyPi) and the problem seems to be fixed. However, the latest release for this repo (0.2.0) still has the problem.

May 08 '25 07:05 floriangroetschla

Yes, the 0.2.0 release is broken in various ways, unfortunately, and we were promised a new release back in March. Perhaps the team is waiting to finish some ongoing work before pulling the trigger on that new release.

May 08 '25 13:05 nchammas