guidance
guidance copied to clipboard
Error with non-English sentence
The bug Thank you for the new version! I'm working with Llama-2-13B-chat. While it works fine in English. When I added some non-English sentences to the prompt, it gave this error as below.
To Reproduce
My model and prompt with non-English sentences.
import torch
from transformers import AutoModelForCausalLM, GPTQConfig, AutoTokenizer
from guidance import models, gen, select
import guidance
PATH_MODEL = '/content/model_13b_translation/TheBloke/Llama_2_13B_chat_GPTQ'
gptq_config = GPTQConfig(bits=4, use_exllama=True, exllama_config={"version":2})
model = AutoModelForCausalLM.from_pretrained(PATH_MODEL, device_map="auto", trust_remote_code=True, quantization_config=gptq_config)
tokenizer = AutoTokenizer.from_pretrained(PATH_MODEL)
gmodel = models.Transformers(model=model, tokenizer=tokenizer)
prompt = '''[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Please translate this question from Vietnamese to English: "Có một cửa hàng sushi rất ngon ở bên kia đường." [/INST] Sure, the English version of that sentence is:'''
When I use Guidance
lm = gmodel + prompt + gen(stop='\n', max_tokens=200)
It gives this error.
[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Please translate this question from Vietnamese to English: "Có một cửa hàng sushi rất ngon ở bên kia đường." [/INST] Sure, the English version of that sentence is:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
[<ipython-input-15-a90df77c853d>](https://localhost:8080/#) in <cell line: 1>()
----> 1 lm = gmodel + prompt + gen(stop='\n', max_tokens=200)
2 frames
[/usr/local/lib/python3.10/dist-packages/guidance/models/_local.py](https://localhost:8080/#) in __call__(self, grammar, max_tokens, n, top_p, temperature, ensure_bos_token, log_probs)
331 # if we cannot consume any more tokens then we are done
332 if not is_forced and token_pos < len(sampled_token) and trie == self._token_trie:
--> 333 assert parser.matched(), "We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?"
334
335 # TODO: if we exactly match the end of the pattern then we can commit to this last token
AssertionError: We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?
However, without Guidance, it works well as below:
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
outputs = model.generate(input_ids, max_length=250)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Please translate this question from Vietnamese to English: "Có một cửa hàng sushi rất ngon ở bên kia đường." [/INST] Sure, the English version of that sentence is:
"There's a very delicious sushi shop over there on the street."
System info (please complete the following information):
- Guidance Version (
guidance.__version__
): 0.1.3
same problem, but it happens in Chinese
Inference in Chinese doesn't work for me, +1
Inference in Chinese doesn't work for me, +1
+1. Is there any solution?
I've been investigating the issue and it seems that the problem might be with the bytes tokenizer using trie. From what I understand, the method _tokenize_prefix is designed to return the longest valid bytes token. However, the assumption underlying the bytes tokenizer, as implemented in this repository, does not seem to hold for all tokenizers.
To address this, my suggestion is to switch to the default tokenizer. The following code snippet outlines the proposed change:
def _tokenize_prefix(self, prompt):
if isinstance(prompt, bytes):
prompt = prompt.decode("utf-8")
return self._orig_tokenizer(prompt).input_ids, []
This modification ensures that if the prompt is in bytes, it gets decoded to a UTF-8 string before tokenization. This approach might be more robust and universally applicable.
Same thing happening to me in Korean.
@anhvth I can't seem to find _orig_tokenizer from the source code. Where Can I find it? Additionally, guaranteeing that prompt is decoded into bytes using utf-8 seems to have already been implemented:
def __call__(self, grammar, max_tokens=1000000, n=1, top_p=1, temperature=0.0, ensure_bos_token=True):
assert n == 1, "Still need to add support for n > 1!"
# get our current context in bytes
prompt = self._current_prompt()
prompt = bytes(prompt, encoding="utf-8")
Same problem, I've left a comment on a related ticket: https://github.com/guidance-ai/guidance/issues/454#issuecomment-1878149397
update: using llamacpp instead of transformers solved the problem for me.
@MINGYUK good to know thanks, unfortunately llama.cpp doesn't work with gptq
I had the same problem when using Japanese in the prompts. However, after reflecting the commit 8f5b3bdfe28455ef267da3e0e590a0d9a4d08104, the error disappeared. I don't yet understand enough to explain the details, but I shared the information for your reference. I hope it helps someone.