lmql icon indicating copy to clipboard operation
lmql copied to clipboard

More advanced stopping conditions: STOPS_AT/STOPS_BEFORE with regex/lists

Open JasperDekoninck opened this issue 1 year ago • 4 comments

It would be very helpful to be able to have more advanced stopping conditions in STOPS_AT/STOPS_BEFORE. One use case for stopping conditions with lists instead of strings, is that:

argmax(chatty_openai=True, max_len=128)
   """[SENTENCE]"""
from
   "openai/text-davinci-003"
where
   STOPS_AT(SENTENCE, [".", "?", "!"])

is a lot easier than:

argmax(chatty_openai=True, max_len=128)
   """[SENTENCE]"""
from
   "openai/text-davinci-003"
where
   STOPS_AT(SENTENCE, ".") and STOPS_AT(SENTENCE, "?") and STOPS_AT(SENTENCE, "!")

For more advanced conditions based on regexes, one could look at the "calculator" example from the "tool-augmented queries". Using regexes, it would be possible to do this without few-shot examples (note that my regexes might not be exactly correct):

def calc(expr):
      expr = re.sub(r"[^0-9+\-*/().,]", "", expr)
      try:
         return eval(expr)
      except Exception:
         return ""

argmax(openai_chunksize=64, max_len=2048)
      QUESTION = "Josh decides to try flipping a house.  He buys a house for $80,000 and then puts in $50,000 in repairs.  This increased the value of the house by 150%.  How much profit did he make?"
      # prompt template
      "Q: {QUESTION}\n"
      "Let's think step by step.\n"
      for i in range(4):
         "[REASONING]"
         "[CALC]"
         if CALC.endswith("="):
            " {calc(CALC)}>>"
      # Note: the last CALC would contain the RESULT.
from 
      'openai/text-davinci-003'
where
      STOPS_BEFORE(REASONING, r"^[^\d]+[\d]+$") and
      STOPS_AT(CALC, "=") and
      STOPS_BEFORE(CALC, r"^[0-9+\-*/().,]+[a-zA-Z?!\n]$")

JasperDekoninck avatar May 05 '23 13:05 JasperDekoninck

Is anyone working on this atm? If not I think I could do this

LachlanGray avatar Jun 21 '23 16:06 LachlanGray

Go ahead :) Currently noone is actively working on this.

lbeurerkellner avatar Jun 21 '23 22:06 lbeurerkellner

Marking this as a good first issue to work on.

The place to start implementation is https://github.com/eth-sri/lmql/blob/main/src/lmql/ops/ops.py#L825. Relevant methods to override are:

  • stop to indicate the decoder whether to stop at the current token
  • postprocess_var/postprocess to indicate to the decoder when postprocessing is required (e.g. to truncate a token, in the middle of which a stopping expression triggered)
  • postprocess_order to make sure multiple stopping phrases on the same variable are resolved to the highest-priority one.

lbeurerkellner avatar Feb 27 '24 14:02 lbeurerkellner

I'm working on this

Saibo-creator avatar Mar 20 '24 11:03 Saibo-creator