guidance Error in program: No valid option generated in #select!

The bug If the option is in Chinese, this error is raised. In fact, not only Chinese, in the non-English context, there may be a large probability of this bug

To Reproduce

import guidance
guidance.llm = guidance.llms.OpenAI("text-davinci-003")
chinese_program = guidance.Program("""哪个水果更甜一点？\n{{#select 'fruit'}}苹果{{or}}橘子{{/select}}""")
chinese_program()

System info (please complete the following information):

OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Mac OS
Guidance Version (guidance.__version__): 0.0.57

just like this:

But I found a way to fix the bug, The original code does not take into account the case of the token being bytes. In the case of the token being bytes, openai will add the "bytes:" prefix, like this:

{
    "text_offset":[
        17
    ],
    "token_logprobs":[
        -0.054483417
    ],
    "tokens":[
        " "
    ],
    "top_logprobs":[
        {
            " ":-0.054483417,
            "bytes: \\xe6":-2.9369752
        }
    ]
}

I fixed the bug with the following code

# in _select.py
if "logprobs" in gen_obj:
    logprobs_result = gen_obj["logprobs"]
    
    # convert the logprobs keys from string back to token ids
    top_logprobs = {}
    for k,v in logprobs_result["top_logprobs"][0].items():
        if k.startswith('bytes:'):
            k = k.replace('bytes:', '')
            k = k.replace('\\x', '')
            k_bytes = bytes.fromhex(k)
            if k.startswith(' '):
                k_bytes = b' ' + k_bytes
            id = parser.program.llm._tokenizer.encode_single_token(k_bytes)
        else:
            id = parser.program.llm.token_to_id(k)
        top_logprobs[id] = v

If possible, I would be happy to contribute a PR for this.

May 29 '23 11:05 wangyuxinwhy

Thanks! Clearly we need some non-english based unit tests. A PR would be much appreciated! (and if you do send in a PR can you also make sure whatever patch also works with the Transformers backend models?

May 29 '23 13:05 slundberg

Thanks! Clearly we need some non-english based unit tests. A PR would be much appreciated! (and if you do send in a PR can you also make sure whatever patch also works with the Transformers backend models?

Ok, I'll check the relevant subclasses of _llm.LLM and add the non-english based unit tests.

May 29 '23 16:05 wangyuxinwhy

guidance guidance copied to clipboard

Error in program: No valid option generated in #select!

guidance
guidance copied to clipboard