guidance Error with non-English sentence

The bug Thank you for the new version! I'm working with Llama-2-13B-chat. While it works fine in English. When I added some non-English sentences to the prompt, it gave this error as below.

To Reproduce

My model and prompt with non-English sentences.

import torch
from transformers import AutoModelForCausalLM, GPTQConfig, AutoTokenizer
from guidance import models, gen, select
import guidance

PATH_MODEL = '/content/model_13b_translation/TheBloke/Llama_2_13B_chat_GPTQ'

gptq_config = GPTQConfig(bits=4, use_exllama=True, exllama_config={"version":2})
model = AutoModelForCausalLM.from_pretrained(PATH_MODEL, device_map="auto", trust_remote_code=True, quantization_config=gptq_config)
tokenizer = AutoTokenizer.from_pretrained(PATH_MODEL)

gmodel = models.Transformers(model=model, tokenizer=tokenizer)

prompt = '''[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Please translate this question from Vietnamese to English: "Có một cửa hàng sushi rất ngon ở bên kia đường." [/INST] Sure, the English version of that sentence is:'''

When I use Guidance

lm = gmodel + prompt + gen(stop='\n', max_tokens=200)

It gives this error.

[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Please translate this question from Vietnamese to English: "Có một cửa hàng sushi rất ngon ở bên kia đường." [/INST] Sure, the English version of that sentence is:
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-15-a90df77c853d>](https://localhost:8080/#) in <cell line: 1>()
----> 1 lm = gmodel + prompt + gen(stop='\n', max_tokens=200)

2 frames
[/usr/local/lib/python3.10/dist-packages/guidance/models/_local.py](https://localhost:8080/#) in __call__(self, grammar, max_tokens, n, top_p, temperature, ensure_bos_token, log_probs)
    331             # if we cannot consume any more tokens then we are done
    332             if not is_forced and token_pos < len(sampled_token) and trie == self._token_trie:
--> 333                 assert parser.matched(), "We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?"
    334 
    335                 # TODO: if we exactly match the end of the pattern then we can commit to this last token

AssertionError: We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?

However, without Guidance, it works well as below:

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
outputs = model.generate(input_ids, max_length=250)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Please translate this question from Vietnamese to English: "Có một cửa hàng sushi rất ngon ở bên kia đường." [/INST] Sure, the English version of that sentence is:

"There's a very delicious sushi shop over there on the street."

System info (please complete the following information):

Guidance Version (guidance.__version__): 0.1.3

Nov 23 '23 01:11 QuangBK

same problem, but it happens in Chinese

Nov 24 '23 07:11 adamwithit

Inference in Chinese doesn't work for me, +1

Nov 28 '23 08:11 4sunshine

Inference in Chinese doesn't work for me, +1

+1. Is there any solution?

Dec 06 '23 06:12 zhangyi999-g

I've been investigating the issue and it seems that the problem might be with the bytes tokenizer using trie. From what I understand, the method _tokenize_prefix is designed to return the longest valid bytes token. However, the assumption underlying the bytes tokenizer, as implemented in this repository, does not seem to hold for all tokenizers.

To address this, my suggestion is to switch to the default tokenizer. The following code snippet outlines the proposed change:

def _tokenize_prefix(self, prompt):
    if isinstance(prompt, bytes):
        prompt = prompt.decode("utf-8")
    return self._orig_tokenizer(prompt).input_ids, []

This modification ensures that if the prompt is in bytes, it gets decoded to a UTF-8 string before tokenization. This approach might be more robust and universally applicable.

Dec 14 '23 06:12 anhvth

Same thing happening to me in Korean.

@anhvth I can't seem to find _orig_tokenizer from the source code. Where Can I find it? Additionally, guaranteeing that prompt is decoded into bytes using utf-8 seems to have already been implemented:

    def __call__(self, grammar, max_tokens=1000000, n=1, top_p=1, temperature=0.0, ensure_bos_token=True):
        assert n == 1, "Still need to add support for n > 1!"
        
        # get our current context in bytes
        prompt = self._current_prompt()
        prompt = bytes(prompt, encoding="utf-8")

Jan 03 '24 05:01 MINGYUK

Same problem, I've left a comment on a related ticket: https://github.com/guidance-ai/guidance/issues/454#issuecomment-1878149397

Jan 05 '24 05:01 freckletonj

update: using llamacpp instead of transformers solved the problem for me.

Jan 05 '24 06:01 MINGYUK

@MINGYUK good to know thanks, unfortunately llama.cpp doesn't work with gptq

Jan 05 '24 07:01 freckletonj

I had the same problem when using Japanese in the prompts. However, after reflecting the commit 8f5b3bdfe28455ef267da3e0e590a0d9a4d08104, the error disappeared. I don't yet understand enough to explain the details, but I shared the information for your reference. I hope it helps someone.

Apr 05 '24 04:04 daioba

guidance guidance copied to clipboard

Error with non-English sentence

guidance
guidance copied to clipboard