text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Grammar has error when backslash ("\") is used in brackets

Open architectdrone opened this issue 10 months ago • 1 comments

Describe the bug

I noticed that some people were having trouble with the json.gbnf file listed on the llama.cpp example grammars (See: #4191). I decided to get to the bottom of what the specific cause of the bug is. It turns out that it is the inclusion of the "\" character which causes this grammar to break. Since the expectation is that any character should be allowed in brackets, the grammar is broken.

The specific error that people (including myself) have been seeing is:

File "C:\dev\text-generation-webui\modules\grammar\grammar_utils.py", line 361, in __init__
    while grammar_encoding[pos] != 0xFFFF:
          ~~~~~~~~~~~~~~~~^^^^^
IndexError: list index out of range

Is there an existing issue for this?

  • [X] I have searched the existing issues

Reproduction

Use this grammar:

root ::= [\\]

error:

File "C:\dev\text-generation-webui\modules\grammar\grammar_utils.py", line 361, in __init__
    while grammar_encoding[pos] != 0xFFFF:
          ~~~~~~~~~~~~~~~~^^^^^
IndexError: list index out of range

Screenshot

don't have one

Logs

this is in the webui

System Info

Shouldn't matter, but I am running on an RTX 2090 iirc

architectdrone avatar Apr 12 '24 18:04 architectdrone

Helpful observation: Observe that when doing

root ::= a

you get

File "C:\dev\text-generation-webui\modules\grammar\grammar_utils.py", line 419, in advance_stack
    subpos = self.rules_pos_dict[referenced_rule_id] + 1
             ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
KeyError: 1

but when doing

root ::= \

you get

File "C:\dev\text-generation-webui\modules\grammar\grammar_utils.py", line 361, in __init__
    while grammar_encoding[pos] != 0xFFFF:
          ~~~~~~~~~~~~~~~~^^^^^
IndexError: list index out of range

This is probably because identifiers in BNF aren't allowed to have backslashes in their names. However, observe that the error is actually the same as the error in the valid case, which means that it mut be something like the parser gets confused when seeing the character outside of quotes. That's my theory

architectdrone avatar Apr 12 '24 19:04 architectdrone

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

github-actions[bot] avatar Jun 11 '24 23:06 github-actions[bot]