guidance icon indicating copy to clipboard operation
guidance copied to clipboard

Error in program: No valid option generated in #select!

Open wangyuxinwhy opened this issue 1 year ago • 2 comments

The bug If the option is in Chinese, this error is raised. In fact, not only Chinese, in the non-English context, there may be a large probability of this bug

To Reproduce

import guidance
guidance.llm = guidance.llms.OpenAI("text-davinci-003")
chinese_program = guidance.Program("""哪个水果更甜一点?\n{{#select 'fruit'}}苹果{{or}}橘子{{/select}}""")
chinese_program()

System info (please complete the following information):

  • OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Mac OS
  • Guidance Version (guidance.__version__): 0.0.57

just like this: image

But I found a way to fix the bug, The original code does not take into account the case of the token being bytes. In the case of the token being bytes, openai will add the "bytes:" prefix, like this:

{
    "text_offset":[
        17
    ],
    "token_logprobs":[
        -0.054483417
    ],
    "tokens":[
        " "
    ],
    "top_logprobs":[
        {
            " ":-0.054483417,
            "bytes: \\xe6":-2.9369752
        }
    ]
}

I fixed the bug with the following code

# in _select.py
if "logprobs" in gen_obj:
    logprobs_result = gen_obj["logprobs"]
    
    # convert the logprobs keys from string back to token ids
    top_logprobs = {}
    for k,v in logprobs_result["top_logprobs"][0].items():
        if k.startswith('bytes:'):
            k = k.replace('bytes:', '')
            k = k.replace('\\x', '')
            k_bytes = bytes.fromhex(k)
            if k.startswith(' '):
                k_bytes = b' ' + k_bytes
            id = parser.program.llm._tokenizer.encode_single_token(k_bytes)
        else:
            id = parser.program.llm.token_to_id(k)
        top_logprobs[id] = v

image

If possible, I would be happy to contribute a PR for this.

wangyuxinwhy avatar May 29 '23 11:05 wangyuxinwhy

Thanks! Clearly we need some non-english based unit tests. A PR would be much appreciated! (and if you do send in a PR can you also make sure whatever patch also works with the Transformers backend models?

slundberg avatar May 29 '23 13:05 slundberg

Thanks! Clearly we need some non-english based unit tests. A PR would be much appreciated! (and if you do send in a PR can you also make sure whatever patch also works with the Transformers backend models?

Ok, I'll check the relevant subclasses of _llm.LLM and add the non-english based unit tests.

wangyuxinwhy avatar May 29 '23 16:05 wangyuxinwhy