I took a quick look at the llama.cpp CFG integration and it doesn't seem to give the correct results. We need to investigate and understand what is wrong.

Apr 10 '24 09:04 rlouf

llamacpp is the only model which supports a CFG Logits Processor, and CFGGuides handling of unambiguously incomplete terminals appears to be broken.

Previously we would handle incomplete terminals by catching exceptions UnexpectedToken / UnexpectedCharacter for calls to interactive.exhaust_lexer(), but the current CFGGuide implementation doesn't.

Code:

import llama_cpp
from outlines.integrations.llamacpp import CFGLogitsProcessor

import outlines.grammars as grammars
import outlines.models as models
import torch


model = models.llamacpp(
    repo_id="QuantFactory/Meta-Llama-3-8B-Instruct-GGUF",
    filename="Meta-Llama-3-8B-Instruct.Q8_0.gguf",
    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
        "mlx-community/Meta-Llama-3-8B-Instruct-4bit"),
    n_gpu_layers=-1,
)
mock_scores = torch.zeros(model.model.n_vocab())

logits_processor = CFGLogitsProcessor(grammars.json, model.model)

input_ids = []
generation = [90, 702]
for token in generation:
    logits_mask = logits_processor(input_ids, mock_scores)
    input_ids.append(token)
logits_processor(input_ids, mock_scores)

Error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 673, in lex
    token = self.root_lexer.next_token(lexer_state, parser_state)
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 598, in next_token
    raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches '"' in the current parser context, at line 1 col 2

{"
 ^
Expected one of: 
        * SIGNED_NUMBER
        * NULL
        * RSQB
        * FALSE
        * LBRACE
        * LSQB
        * UNESCAPED_STRING
        * COMMA
        * RBRACE
        * TRUE
        * COLON

Previous tokens: Token('LBRACE', '{')


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/repro_796.py", line 20, in <module>
    lp(response[:i], mock_scores)
  File "/opt/conda/lib/python3.10/site-packages/outlines/integrations/llamacpp.py", line 123, in __call__
    allowed_tokens = self.fsm.get_next_instruction(self._fsm_state).tokens
  File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/guide.py", line 350, in get_next_instruction
    interactive.exhaust_lexer()
  File "/opt/conda/lib/python3.10/site-packages/lark/parsers/lalr_interactive_parser.py", line 52, in exhaust_lexer
    return list(self.iter_parse())
  File "/opt/conda/lib/python3.10/site-packages/lark/parsers/lalr_interactive_parser.py", line 43, in iter_parse
    for token in self.lexer_thread.lex(self.parser_state):
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 676, in lex
    raise e  # Raise the original UnexpectedCharacters. The root lexer raises it with the wrong expected set.
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 665, in lex
    yield lexer.next_token(lexer_state, parser_state)
  File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 598, in next_token
    raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches '"' in the current parser context, at line 1 col 2

{"
 ^
Expected one of: 
        * UNESCAPED_STRING
        * RBRACE

Previous tokens: Token('LBRACE', '{')

Investigating how to resolve

May 21 '24 07:05 lapp0

There are a lot of improvements that can be made to ensure CFGLogitsProcessor doesn't result in unexpected RuntimeErrors.

Here is a summary of the issues that would need to be resolved:

major

In CFGGuide, UnexpectedCharactor error occurs if a terminal is incomplete when calling interactive.exhaust_lexer().
In CFGGuide, if $EOS is a legal next terminal, but not the only legal next terminal, under some conditions we get ValueError("The vocabulary does not allow us to build a sequence that matches the input regex").
CFGGuide generates a new RegexGuide and resets the state to 0 if the existing RegexGuide is possibly terminal, not necessarily terminal. This disallows some legal grammars.
Structural issue: in CFGGuide, state is updated in get_next_instruction(), rather than get_next_state()
There are other conditions where parsing breaks down resulting in an illegal generation, which I haven't yet determined the cause of.

minor

CFGGuide has self.generation += self.tokenizer.decode([token_id])[0], but for llamacpp, the passed tokenizer returns a str, not List[str], this means the tokens ["aaaa", "bbbb"] will be read by the parser as "ab" rather than aaaabbbb.
CFGGuide checks for eos_token_id in RegexGuide.get_next_instruction(), but it should be checking RegexGuide.is_final_state
UNESCAPED_STRING allows invalid json (I have a patch ready for this)
LogitsProcessor incorrectly handles Write instructions in cases where num tokens > 1https://github.com/outlines-dev/outlines/issues/855

Proposal

@rlouf @brandonwillard could you share your thoughts please?

Due to the issues with CFGGuide, and the fact that any improvements to the CFGGuide will eventually be superseded by parsing.py, I propose getting parsing.py working as a next step, rather than correcting all the bugs in CFGGuide. This will reduce technical debt substantially, and provide fully functioning CFG structured generation.

May 22 '24 01:05 lapp0

I agree on fixing issues in parsing.py first.

May 22 '24 08:05 rlouf

Due to the issues with CFGGuide, and the fact that any improvements to the CFGGuide will eventually be superseded by parsing.py, I propose getting parsing.py working as a next step, rather than correcting all the bugs in CFGGuide. This will reduce technical debt substantially, and provide fully functioning CFG structured generation.

A necessary change that will almost certainly fix a few of those issues is the use of PartialLark. That is a top priority.

The partial parsing code in parsing.py should already work. At most, the way that lark is being extended can be cleaned up (e.g. remove the new options conflict when using Lark after PartialLark).

May 22 '24 22:05 brandonwillard

outlines
outlines copied to clipboard

Test the CFG integration for `llama.cpp`

Proposal

outlines outlines copied to clipboard

Test the CFG integration for `llama.cpp`

Proposal

outlines
outlines copied to clipboard