guidance Custom logits processor

I couldn't find any information on this.

Is there a way for me to write my own, custom logits processor wherein any algorithm of my choice can be used, however in conjunction with Guidance's token healing and "text, not tokens" philosophy?

Currently, Guidance does not always supply what I need via regex, and I understand that implementing new regex methods can be difficult or indecisive. So, if this is not possible already, I would greatly appreciate its addition to the library!

Edit: This is currently available with LMQL, so it seems plausible.

Feb 27 '25 04:02 urroxyz

Hey @urro-xyz,

Interesting idea. Supplying custom logits processors and also custom sampling functions would be a nice flexible evolution of the package. There is an active challenge to making these types of tricks run as efficiently as the current guidance codebase does, but perhaps we can still expose these for research even if they aren't quite as performant.

I'd like to think more deeply about the best way we can implement this. We're currently refactoring the guidance internals to enable more flexibility, and it's worth spending some time thinking about potential callbacks into Python.

Out of curiosity, what kind of regular expression methods are you finding prohibitive to implement? Guidance is also built to process context-free grammars, so there's a chance we also might be able to help you with the existing codebase.

Tagging @hudson-ai @mmoskal @nopdive @paulbkoch for awareness.

Mar 02 '25 17:03 Harsha-Nori

Thank you for the informative response,

Currently, I need a complex regex that is only possible with at least one of the following features, in order of most efficient to least efficient:

Lookarounds (lookaheads, lookbehinds)
Reusable named capture groups
Infinite-length grammar

Guidance doesn't directly support this, although it is possible with EBNF format, which can also be converted to a custom logits processor via llama-cpp-python or transformers-cfg.

Even still, sometimes I want direct access to the rules I write so I can make adjustments simplistically. Thus, having access to a language for logits processor other than the current offerings would be grandiose! Guidance's token healing is state-of-the-art and would make creating any new LLM decoding algorithm a breeze.

Please tell me more about context-free grammars, too, as I don't full understand what that term refers to on its own. How can I use them within the library?

Mar 02 '25 18:03 urroxyz

EBNF is a way of writing down a context-free grammar. There are many different EBNF formats. The grammar engine in guidance is called llguidance and uses a modified Lark syntax. Current Guidance main branch serializes to this syntax (as of a few days ago).

The derivre regular expression library that we use for the "lexer" part of the grammar doesn't support lookarounds or captures, but it does support negation and intersection. These are currently not exposed in the Lark syntax, but could be. Could you give a more detailed example of what you're trying to do?

Also, I believe all contex-free grammars (and thus also all EBNF grammars) should be expressible in the Guidance syntax.

Mar 03 '25 19:03 mmoskal

Is there a way for me to use Lark or EBNF with Guidance directly?

Mar 03 '25 19:03 urroxyz

@urroxyz just merged a PR with main that should let you do exactly that.

Basically:

from guidance import lark, gbnf_to_lark

lm += lark('{your_lark_string}')

# or

lark_string = gbnf_to_lark('{your_gbnf_string}') # probably want to manually inspect this output
lm += lark(lark_string)

The lark function takes a string corresponding to a Lark(-like) grammar, and it returns an object you can add to a Model to run the grammar (just like gen or json). For more info on the syntax of the grammar specification language, see https://github.com/guidance-ai/llguidance/blob/main/docs/syntax.md.

If you're working with a GBNF string (llamacpp's syntax), you can use the gbnf_to_lark function to rewrite the grammar in the Lark-like syntax. Note that this function is pretty experimental, and its outputs should be manually inspected/edited if it doesn't give you what you want. Or you can open an issue 😄

Mar 10 '25 16:03 hudson-ai

Thank you so much!

While this doesn't have much to do with the original issue, it still provides feature support for named capture groups that cannot be performed with a simple regex in Guidance.

I hope to see more options in the future!

Mar 10 '25 16:03 urroxyz

I keep getting this error:

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[2], line 61
     58     print(lm)
     60 if __name__ == "__main__":
---> 61     main()

Cell In[2], line 31, in main()
     24 your_lark_string = r"""
     25     ?start: greeting ", " name "!"
     26     greeting: "Hello" | "Hi"
     27     name: /[A-Za-z]+/
     28     %ignore " "
     29 """
     30 print("\nAdding Lark grammar (directly supplied)...")
---> 31 lm += lark(your_lark_string)
     33 # ---------------------------------------------
     34 # Convert a GBNF string to Lark format.
     35 # (Not working right now.)
     36 # ---------------------------------------------
     37 your_gbnf_string = """
     38     start: greeting ", " name "!"
     39     greeting: "Hello" | "Hi"
     40     name: /[A-Za-z]+/
     41 """

File ~/Library/Python/3.11/lib/python/site-packages/guidance/models/_base/_model.py:105, in Model.__add__(self, other)
    103     return other(self)
    104 if isinstance(other, ASTNode):
--> 105     self = self._apply_node(other)
    106     self = self._update_open_block_captures()
    107     return self

File ~/Library/Python/3.11/lib/python/site-packages/guidance/models/_base/_model.py:133, in Model._apply_node(self, node)
    130 else:
    131     self._update_trace_node(self._id, self._parent_id, StatelessGuidanceInput(value=node))
--> 133 for i, output_attr in enumerate(self._client.run(self._state, node)):
    134     if isinstance(output_attr, TextOutput):
    135         # TODO: put this elsewhere (inside state?)
    136         self.token_count += output_attr.token_count

File ~/Library/Python/3.11/lib/python/site-packages/guidance/models/_base/_client.py:33, in Client.run(self, state, node, **kwargs)
     32 def run(self, state: S, node: ASTNode, **kwargs) -> Iterator[OutputAttr]:
---> 33     yield from node.simplify()._run(self, state, **kwargs)

File ~/Library/Python/3.11/lib/python/site-packages/guidance/models/_engine/_client.py:50, in EngineClient.grammar(self, state, node, **kwargs)
     42 engine_gen = self.engine(
     43     state,
     44     node.ll_grammar(),
     45     ensure_bos_token=True,
     46     echo=False,
     47 )
     49 delayed_bytes = b""
---> 50 for chunk in engine_gen:
     51     new_bytes = chunk.new_bytes
     52     new_text, delayed_bytes = partial_decode(new_bytes)

File ~/Library/Python/3.11/lib/python/site-packages/guidance/models/_engine/_engine.py:197, in Engine.__call__(self, state, grammar, ensure_bos_token, echo)
    176 """Main entry point for the inference-parser loop. Yields EngineCallResponse objects as
    177 the parser advances through the grammar.
    178 
   (...)    190     Ensures that the prompt ends with the BOS token.
    191 """
    192 # TODO: Pass these to get_logits
    193 # images = state.images
    194 # audio = state.audio
    195 # videos = state.videos
--> 197 parser = TokenParser(
    198     grammar,
    199     tokenizer=self.tokenizer,
    200     prompt=state.prompt.encode("utf-8"),
    201     ensure_bos_token=ensure_bos_token,
    202     enable_backtrack=self.enable_backtrack,
    203     enable_ff_tokens=self.enable_ff_tokens,
    204 )
    206 has_get_logits = True
    207 engine_output = None

File ~/Library/Python/3.11/lib/python/site-packages/guidance/_parser.py:40, in TokenParser.__init__(self, grammar, tokenizer, prompt, ensure_bos_token, enable_backtrack, enable_ff_tokens)
     30 def __init__(
     31     self,
     32     grammar: LLGrammar,
   (...)     37     enable_ff_tokens: bool = True,
     38 ):
     39     self.tokenizer = tokenizer
---> 40     self.ll_tokenizer = llguidance.LLTokenizer(llguidance.TokenizerWrapper(tokenizer))
     41     self.ll_interpreter = llguidance.LLInterpreter(
     42         self.ll_tokenizer,
     43         grammar.model_dump_json(),
   (...)     46         log_level=int(os.environ.get("LLGUIDANCE_LOG_LEVEL", "1")),
     47     )
     48     self._threadpool = ThreadPoolExecutor(max_workers=1)

PanicException: assertion failed: word.len() < (1 << LEN_BITS)

Mar 11 '25 02:03 urroxyz

@urroxyz can you send a few pieces of information?

llguidance version (pip show llguidance)
model id and/or link to the huggingface model card

Mar 11 '25 03:03 hudson-ai

Thanks for the quick reply! Sorry for my late one.

I'm using the latest stable of llguidance, version 0.7.0. I've also tried a variety of models, but the main one I aim to use is one that usually works fine with guidance: Llama 3.2 1B and 3B, available at links one and two, respectively. Finally, I am loading via LlamaCPP with GGUF My Repo quantizations, llama-3.2-1b-q8_0.gguf and llama-3.2-3b-q8_0.gguf.

Loading with Transformers allowed generation for me, but I'd much prefer using the former so I can run on CPU. Sorry, it seems that GGUF wasn't tested for this feature yet, and that makes sense.

The model does continue to generate, though, once the grammar is completed, and even past a stop token.

Mar 11 '25 19:03 urroxyz

The model does continue to generate, though, once the grammar is completed, and even past a stop token.

Argh oh no! Will look into that bug after dealing with the PanicException: assertion failed: word.len() < (1 << LEN_BITS) llamacpp issue. Appreciate your patience 🙏

Mar 11 '25 23:03 hudson-ai