Custom logits processor
I couldn't find any information on this.
Is there a way for me to write my own, custom logits processor wherein any algorithm of my choice can be used, however in conjunction with Guidance's token healing and "text, not tokens" philosophy?
Currently, Guidance does not always supply what I need via regex, and I understand that implementing new regex methods can be difficult or indecisive. So, if this is not possible already, I would greatly appreciate its addition to the library!
Edit: This is currently available with LMQL, so it seems plausible.
Hey @urro-xyz,
Interesting idea. Supplying custom logits processors and also custom sampling functions would be a nice flexible evolution of the package. There is an active challenge to making these types of tricks run as efficiently as the current guidance codebase does, but perhaps we can still expose these for research even if they aren't quite as performant.
I'd like to think more deeply about the best way we can implement this. We're currently refactoring the guidance internals to enable more flexibility, and it's worth spending some time thinking about potential callbacks into Python.
Out of curiosity, what kind of regular expression methods are you finding prohibitive to implement? Guidance is also built to process context-free grammars, so there's a chance we also might be able to help you with the existing codebase.
Tagging @hudson-ai @mmoskal @nopdive @paulbkoch for awareness.
Thank you for the informative response,
Currently, I need a complex regex that is only possible with at least one of the following features, in order of most efficient to least efficient:
- Lookarounds (lookaheads, lookbehinds)
- Reusable named capture groups
- Infinite-length grammar
Guidance doesn't directly support this, although it is possible with EBNF format, which can also be converted to a custom logits processor via llama-cpp-python or transformers-cfg.
Even still, sometimes I want direct access to the rules I write so I can make adjustments simplistically. Thus, having access to a language for logits processor other than the current offerings would be grandiose! Guidance's token healing is state-of-the-art and would make creating any new LLM decoding algorithm a breeze.
Please tell me more about context-free grammars, too, as I don't full understand what that term refers to on its own. How can I use them within the library?
EBNF is a way of writing down a context-free grammar. There are many different EBNF formats. The grammar engine in guidance is called llguidance and uses a modified Lark syntax. Current Guidance main branch serializes to this syntax (as of a few days ago).
The derivre regular expression library that we use for the "lexer" part of the grammar doesn't support lookarounds or captures, but it does support negation and intersection. These are currently not exposed in the Lark syntax, but could be. Could you give a more detailed example of what you're trying to do?
Also, I believe all contex-free grammars (and thus also all EBNF grammars) should be expressible in the Guidance syntax.
Is there a way for me to use Lark or EBNF with Guidance directly?
@urroxyz just merged a PR with main that should let you do exactly that.
Basically:
from guidance import lark, gbnf_to_lark
lm += lark('{your_lark_string}')
# or
lark_string = gbnf_to_lark('{your_gbnf_string}') # probably want to manually inspect this output
lm += lark(lark_string)
The lark function takes a string corresponding to a Lark(-like) grammar, and it returns an object you can add to a Model to run the grammar (just like gen or json). For more info on the syntax of the grammar specification language, see https://github.com/guidance-ai/llguidance/blob/main/docs/syntax.md.
If you're working with a GBNF string (llamacpp's syntax), you can use the gbnf_to_lark function to rewrite the grammar in the Lark-like syntax. Note that this function is pretty experimental, and its outputs should be manually inspected/edited if it doesn't give you what you want. Or you can open an issue 😄
Thank you so much!
While this doesn't have much to do with the original issue, it still provides feature support for named capture groups that cannot be performed with a simple regex in Guidance.
I hope to see more options in the future!
I keep getting this error:
---------------------------------------------------------------------------
PanicException Traceback (most recent call last)
Cell In[2], line 61
58 print(lm)
60 if __name__ == "__main__":
---> 61 main()
Cell In[2], line 31, in main()
24 your_lark_string = r"""
25 ?start: greeting ", " name "!"
26 greeting: "Hello" | "Hi"
27 name: /[A-Za-z]+/
28 %ignore " "
29 """
30 print("\nAdding Lark grammar (directly supplied)...")
---> 31 lm += lark(your_lark_string)
33 # ---------------------------------------------
34 # Convert a GBNF string to Lark format.
35 # (Not working right now.)
36 # ---------------------------------------------
37 your_gbnf_string = """
38 start: greeting ", " name "!"
39 greeting: "Hello" | "Hi"
40 name: /[A-Za-z]+/
41 """
File ~/Library/Python/3.11/lib/python/site-packages/guidance/models/_base/_model.py:105, in Model.__add__(self, other)
103 return other(self)
104 if isinstance(other, ASTNode):
--> 105 self = self._apply_node(other)
106 self = self._update_open_block_captures()
107 return self
File ~/Library/Python/3.11/lib/python/site-packages/guidance/models/_base/_model.py:133, in Model._apply_node(self, node)
130 else:
131 self._update_trace_node(self._id, self._parent_id, StatelessGuidanceInput(value=node))
--> 133 for i, output_attr in enumerate(self._client.run(self._state, node)):
134 if isinstance(output_attr, TextOutput):
135 # TODO: put this elsewhere (inside state?)
136 self.token_count += output_attr.token_count
File ~/Library/Python/3.11/lib/python/site-packages/guidance/models/_base/_client.py:33, in Client.run(self, state, node, **kwargs)
32 def run(self, state: S, node: ASTNode, **kwargs) -> Iterator[OutputAttr]:
---> 33 yield from node.simplify()._run(self, state, **kwargs)
File ~/Library/Python/3.11/lib/python/site-packages/guidance/models/_engine/_client.py:50, in EngineClient.grammar(self, state, node, **kwargs)
42 engine_gen = self.engine(
43 state,
44 node.ll_grammar(),
45 ensure_bos_token=True,
46 echo=False,
47 )
49 delayed_bytes = b""
---> 50 for chunk in engine_gen:
51 new_bytes = chunk.new_bytes
52 new_text, delayed_bytes = partial_decode(new_bytes)
File ~/Library/Python/3.11/lib/python/site-packages/guidance/models/_engine/_engine.py:197, in Engine.__call__(self, state, grammar, ensure_bos_token, echo)
176 """Main entry point for the inference-parser loop. Yields EngineCallResponse objects as
177 the parser advances through the grammar.
178
(...) 190 Ensures that the prompt ends with the BOS token.
191 """
192 # TODO: Pass these to get_logits
193 # images = state.images
194 # audio = state.audio
195 # videos = state.videos
--> 197 parser = TokenParser(
198 grammar,
199 tokenizer=self.tokenizer,
200 prompt=state.prompt.encode("utf-8"),
201 ensure_bos_token=ensure_bos_token,
202 enable_backtrack=self.enable_backtrack,
203 enable_ff_tokens=self.enable_ff_tokens,
204 )
206 has_get_logits = True
207 engine_output = None
File ~/Library/Python/3.11/lib/python/site-packages/guidance/_parser.py:40, in TokenParser.__init__(self, grammar, tokenizer, prompt, ensure_bos_token, enable_backtrack, enable_ff_tokens)
30 def __init__(
31 self,
32 grammar: LLGrammar,
(...) 37 enable_ff_tokens: bool = True,
38 ):
39 self.tokenizer = tokenizer
---> 40 self.ll_tokenizer = llguidance.LLTokenizer(llguidance.TokenizerWrapper(tokenizer))
41 self.ll_interpreter = llguidance.LLInterpreter(
42 self.ll_tokenizer,
43 grammar.model_dump_json(),
(...) 46 log_level=int(os.environ.get("LLGUIDANCE_LOG_LEVEL", "1")),
47 )
48 self._threadpool = ThreadPoolExecutor(max_workers=1)
PanicException: assertion failed: word.len() < (1 << LEN_BITS)
@urroxyz can you send a few pieces of information?
- llguidance version (
pip show llguidance) - model id and/or link to the huggingface model card
Thanks for the quick reply! Sorry for my late one.
I'm using the latest stable of llguidance, version 0.7.0. I've also tried a variety of models, but the main one I aim to use is one that usually works fine with guidance: Llama 3.2 1B and 3B, available at links one and two, respectively. Finally, I am loading via LlamaCPP with GGUF My Repo quantizations, llama-3.2-1b-q8_0.gguf and llama-3.2-3b-q8_0.gguf.
Loading with Transformers allowed generation for me, but I'd much prefer using the former so I can run on CPU. Sorry, it seems that GGUF wasn't tested for this feature yet, and that makes sense.
The model does continue to generate, though, once the grammar is completed, and even past a stop token.
The model does continue to generate, though, once the grammar is completed, and even past a stop token.
Argh oh no! Will look into that bug after dealing with the PanicException: assertion failed: word.len() < (1 << LEN_BITS) llamacpp issue. Appreciate your patience 🙏