rellm Fix generating partially valid tokens

Fix generating partially valid tokens

Open mattiasarro opened this issue 2 years ago • 1 comments

trafficstars

Matching a regex partially can lead to generating a token which causes the whole generated sequence to be invalid, even if a substring of the token would result in a valid output.

The other option would be to tweak complete_re we run the if stop_after_match: block after every character of the token (rather than the full token text) to the output text, but that's less clean. Or is that needed to be able to generate some output sequences which can only occur by generating a larger invalid token and then pruning the output?

Edit: looks like we need the latter approach, see latest commit.

May 30 '23 19:05 mattiasarro

I'm interested in this solution too, as I was having the same parserllm issues as in: https://github.com/r2d4/parserllm/issues/4

Also, the outlines project may interest you. They precompile valid continuations, and then inference happens in O(c).

The issue I have with outlines though is abominable lark support; their example is slooow: https://github.com/normal-computing/outlines/blob/main/examples/parsing.py

Sep 05 '23 00:09 freckletonj

rellm rellm copied to clipboard

Fix generating partially valid tokens

rellm
rellm copied to clipboard