outlines
outlines copied to clipboard
Test the CFG integration for `llama.cpp`
I took a quick look at the llama.cpp CFG integration and it doesn't seem to give the correct results. We need to investigate and understand what is wrong.
llamacpp is the only model which supports a CFG Logits Processor, and CFGGuides handling of unambiguously incomplete terminals appears to be broken.
Previously we would handle incomplete terminals by catching exceptions UnexpectedToken / UnexpectedCharacter for calls to interactive.exhaust_lexer(), but the current CFGGuide implementation doesn't.
Code:
import llama_cpp
from outlines.integrations.llamacpp import CFGLogitsProcessor
import outlines.grammars as grammars
import outlines.models as models
import torch
model = models.llamacpp(
repo_id="QuantFactory/Meta-Llama-3-8B-Instruct-GGUF",
filename="Meta-Llama-3-8B-Instruct.Q8_0.gguf",
tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
"mlx-community/Meta-Llama-3-8B-Instruct-4bit"),
n_gpu_layers=-1,
)
mock_scores = torch.zeros(model.model.n_vocab())
logits_processor = CFGLogitsProcessor(grammars.json, model.model)
input_ids = []
generation = [90, 702]
for token in generation:
logits_mask = logits_processor(input_ids, mock_scores)
input_ids.append(token)
logits_processor(input_ids, mock_scores)
Error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 673, in lex
token = self.root_lexer.next_token(lexer_state, parser_state)
File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 598, in next_token
raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches '"' in the current parser context, at line 1 col 2
{"
^
Expected one of:
* SIGNED_NUMBER
* NULL
* RSQB
* FALSE
* LBRACE
* LSQB
* UNESCAPED_STRING
* COMMA
* RBRACE
* TRUE
* COLON
Previous tokens: Token('LBRACE', '{')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/repro_796.py", line 20, in <module>
lp(response[:i], mock_scores)
File "/opt/conda/lib/python3.10/site-packages/outlines/integrations/llamacpp.py", line 123, in __call__
allowed_tokens = self.fsm.get_next_instruction(self._fsm_state).tokens
File "/opt/conda/lib/python3.10/site-packages/outlines/fsm/guide.py", line 350, in get_next_instruction
interactive.exhaust_lexer()
File "/opt/conda/lib/python3.10/site-packages/lark/parsers/lalr_interactive_parser.py", line 52, in exhaust_lexer
return list(self.iter_parse())
File "/opt/conda/lib/python3.10/site-packages/lark/parsers/lalr_interactive_parser.py", line 43, in iter_parse
for token in self.lexer_thread.lex(self.parser_state):
File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 676, in lex
raise e # Raise the original UnexpectedCharacters. The root lexer raises it with the wrong expected set.
File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 665, in lex
yield lexer.next_token(lexer_state, parser_state)
File "/opt/conda/lib/python3.10/site-packages/lark/lexer.py", line 598, in next_token
raise UnexpectedCharacters(lex_state.text, line_ctr.char_pos, line_ctr.line, line_ctr.column,
lark.exceptions.UnexpectedCharacters: No terminal matches '"' in the current parser context, at line 1 col 2
{"
^
Expected one of:
* UNESCAPED_STRING
* RBRACE
Previous tokens: Token('LBRACE', '{')
Investigating how to resolve
There are a lot of improvements that can be made to ensure CFGLogitsProcessor doesn't result in unexpected RuntimeErrors.
Here is a summary of the issues that would need to be resolved:
major
- In
CFGGuide,UnexpectedCharactorerror occurs if a terminal is incomplete when callinginteractive.exhaust_lexer(). - In
CFGGuide, if$EOSis a legal next terminal, but not the only legal next terminal, under some conditions we getValueError("The vocabulary does not allow us to build a sequence that matches the input regex"). CFGGuidegenerates a newRegexGuideand resets the state to0if the existingRegexGuideis possibly terminal, not necessarily terminal. This disallows some legal grammars.- Structural issue: in
CFGGuide, state is updated inget_next_instruction(), rather thanget_next_state() - There are other conditions where parsing breaks down resulting in an illegal generation, which I haven't yet determined the cause of.
minor
CFGGuidehasself.generation += self.tokenizer.decode([token_id])[0], but forllamacpp, the passed tokenizer returns astr, notList[str], this means the tokens["aaaa", "bbbb"]will be read by the parser as"ab"rather thanaaaabbbb.CFGGuidechecks foreos_token_idinRegexGuide.get_next_instruction(), but it should be checkingRegexGuide.is_final_stateUNESCAPED_STRINGallows invalid json (I have a patch ready for this)LogitsProcessorincorrectly handles Write instructions in cases where num tokens > 1https://github.com/outlines-dev/outlines/issues/855
Proposal
@rlouf @brandonwillard could you share your thoughts please?
Due to the issues with CFGGuide, and the fact that any improvements to the CFGGuide will eventually be superseded by parsing.py, I propose getting parsing.py working as a next step, rather than correcting all the bugs in CFGGuide. This will reduce technical debt substantially, and provide fully functioning CFG structured generation.
I agree on fixing issues in parsing.py first.
Due to the issues with
CFGGuide, and the fact that any improvements to theCFGGuidewill eventually be superseded byparsing.py, I propose gettingparsing.pyworking as a next step, rather than correcting all the bugs inCFGGuide. This will reduce technical debt substantially, and provide fully functioning CFG structured generation.
A necessary change that will almost certainly fix a few of those issues is the use of PartialLark. That is a top priority.
The partial parsing code in parsing.py should already work. At most, the way that lark is being extended can be cleaned up (e.g. remove the new options conflict when using Lark after PartialLark).