guidance assert token_byte_positions[-1] == last

The bug There is the following error when trying to generate with certain models. The error is assert token_byte_positions[-1] == last_pos on gen() and can be reproduced using the code below. Note this bug only applies on certain LLMs

To Reproduce It fails on this LLM

# Imports
import guidance
from guidance import image
from guidance import user, assistant, system
from guidance import gen, select
from guidance import capture, Tool, regex

# Paths
path_tess = "/home/sr/Desktop/CloserModels/tess-34b-v1.5b.Q5_K_M.gguf"

# Models
model = guidance.models.LlamaCpp(path_tess, n_gpu_layers=-1, n_ctx=2048)
llama2 = model

lm = llama2 + "Explain Oppenheimer's contribution to the world" + gen(name="output")

Error

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[1], [line 15](vscode-notebook-cell:?execution_count=1&line=15)
     [12](vscode-notebook-cell:?execution_count=1&line=12) model = guidance.models.LlamaCpp(path_tess, n_gpu_layers=-1, n_ctx=2048)
     [13](vscode-notebook-cell:?execution_count=1&line=13) llama2 = model
---> [15](vscode-notebook-cell:?execution_count=1&line=15) lm = llama2 + "Explain Oppenheimer's contribution to the world" + gen(name="output")

File [~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:302](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:302), in Model.__add__(self, value)
    [300](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:300) # run stateless functions (grammar nodes)
    [301](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:301) elif isinstance(value, StatelessFunction):
--> [302](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:302)     out = lm._run_stateless(value)
    [304](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:304) # run stateful functions
    [305](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:305) else:
    [306](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:306)     out = value(lm)

File [~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:465](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:465), in Model._run_stateless(lm, stateless_function, temperature, top_p, n)
    [463](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:463) delayed_bytes = b""
    [464](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:464) # last_is_generated = False
--> [465](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:465) for new_bytes, is_generated, new_bytes_prob, capture_groups, capture_group_log_probs, new_token_count in gen_obj:
    [466](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:466) 
    [467](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:467)     # we make everything full probability if we are not computing uncertainty
    [468](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:468)     if not lm.compute_log_probs:
    [469](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:469)         new_bytes_prob = 1.0

File [~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:638](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:638), in Model.__call__(self, grammar, max_tokens, n, top_p, temperature, ensure_bos_token)
    [636](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:636) # run a simple tokenizer (that does not use a grammar) on the prefix for better performance
    [637](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:637) token_ids,token_byte_positions = self._tokenize_prefix(prompt)
--> [638](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:638) token_ids,token_byte_positions = self._cleanup_tokens(token_ids,token_byte_positions)
    [639](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:639) if len(token_byte_positions) > 0:
    [640](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:640)     pre_parser_bytes = token_byte_positions[-1]

File [~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:611](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:611), in Model._cleanup_tokens(self, token_ids, token_byte_positions)
    [609](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:609)         for i in range(1, len(token_byte_positions)):
    [610](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:610)             token_byte_positions[i] -= 1
--> [611](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:611)     assert token_byte_positions[-1] == last_pos
    [613](https://file+.vscode-resource.vscode-cdn.net/home/sr/Desktop/Programs/Tests/~/anaconda3/envs/llamacpptest34/lib/python3.11/site-packages/guidance/models/_model.py:613) return token_ids, token_byte_positions

AssertionError:

But it works on this LLM

# Imports
import guidance
from guidance import image
from guidance import user, assistant, system
from guidance import gen, select
from guidance import capture, Tool, regex

# Paths
path_tess = "/home/sr/Desktop/CloserModels/bagel-dpo-7b-v0.4.Q8_0.gguf"

# Models
model = guidance.models.LlamaCpp(path_tess, n_gpu_layers=-1, n_ctx=2048)
llama2 = model

lm = llama2 + "Explain Oppenheimer's contribution to the world" + gen(name="output")

Output Explain Oppenheimer's contribution to the world of physics.

Oppenheimer's contribution to the world of physics is significant and far-reaching. He was a key figure in the development of quantum mechanics and the Manhattan Project, which led to the creation of the atomic bomb. Oppenheimer's work on the Uncertainty Principle, which states that it is impossible to simultaneously measure the exact position and momentum of a particle, was groundbreaking and helped shape the field of quantum mechanics. Additionally, his leadership of the Los Alamos National Laboratory during the Manhattan Project was crucial in the development and deployment of the atomic bomb. Oppenheimer's contributions to physics have had a profound impact on our understanding of the universe and the potential for human innovation.

System info (please complete the following information):

Ubuntu 22.04
Guidance Version (0.1.10):

Feb 09 '24 09:02 psych0v0yager

Facing the same problem with models.Transformers mistral models.

Apr 03 '24 18:04 ishrat-tl

Any progress on this? Having the same problem on mistral-7b-instruct-v0.2.Q8_0.gguf -.-

Jun 28 '24 12:06 EricsonWillians

Same problem with 0.1.13 and 0.1.16

Aug 21 '24 15:08 IMJONEZZ

guidance guidance copied to clipboard

assert token_byte_positions[-1] == last_pos on gen()

guidance
guidance copied to clipboard