lmql icon indicating copy to clipboard operation
lmql copied to clipboard

[Question] Support constraining via context free grammar for dsls

Open DanielProkhorov opened this issue 1 year ago • 2 comments

Hi @lbeurerkellner,

Do you have any plans to "natively" integrate token constraint into the lmql language, perhaps through ATLR/Lark/ENBF grammar notation? This is a feature currently supported by guidance (https://github.com/guidance-ai/guidance?tab=readme-ov-file#context-free-grammars) and outlined in examples from other projects like outlines (https://github.com/outlines-dev/outlines?tab=readme-ov-file#using-context-free-grammars-to-guide-generation).

DanielProkhorov avatar Jan 01 '24 16:01 DanielProkhorov

We have some plans, but there is no concrete ETA currently. I will keep the issue to track the status.

Do you have any concrete use cases in mind?

lbeurerkellner avatar Jan 01 '24 20:01 lbeurerkellner

Do you have any concrete use cases in mind?

Yes, I do. I recently was playing around with a DSL for HMI testing within the automotive domain. Using guidance my script look like the following:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import guidance
from guidance import models, system, user, assistant, gen, select, one_or_more

checkpoint = "HuggingFaceH4/zephyr-7b-beta"
lm = models.TransformersChat(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)

@guidance(stateless=True)
def enter_teststep(lm):
    return lm + "Enter " + gen(stop="into", max_tokens=20) + " into the " + gen(max_tokens=3) + "."

@guidance(stateless=True)
def tap_teststep(lm):
    return lm + "Tap the " + gen(stop="button", max_tokens=3) + " button"

@guidance(stateless=True)
def modification_teststep(lm):
    return lm + select(["Activate", "Deactivate"]) + " the " + gen(stop="option", max_tokens=3) + " option"

@guidance(stateless=True)
def slider_modification_teststep(lm):
    return lm + select(["Increase", "Decrease"]) + " the " + select(["vertical", "horizontal"]) + gen(stop="slider", max_tokens=3) + " slider by " + one_or_more(select(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])) + gen(max_tokens=2)

@guidance(stateless=True)
def selection_teststep(lm):
    return lm + "Select " + gen(stop="from", max_tokens=5) + " from the list"

@guidance(stateless=True)
def teststep(lm):
    return lm + select([enter_teststep(), tap_teststep(), modification_teststep(), slider_modification_teststep(), selection_teststep()])

system_msg = "You are an expert in software testing within the automotive infotainment domain. Additionally, your understanding of the ANTLR grammar is enormous."

prompt = """Antlr Grammar for Test Case Description DSL in the Infotainment Domain:

grammar TestCase;

testcase: 'Testcase:' ID
  'Preconditions:' teststeps*
  'Actions:' teststeps+
  'Postconditions:' teststeps*;

teststeps: enterStep
  | tapStep
  | modificationStep
  | sliderModificationStep
  | selectStep;

enterStep: 'Enter the' OBJECTNAME 'into the' TARGETNAME;

tapStep: 'Tap the' OBJECTNAME 'button';

modificationStep: (Activate | Deactivate) 'the' OBJECTNAME 'option';

sliderModificationStep: (Increase | Decrease) 'the' ORIENTATION OBJECTNAME 'slider by' NUMBER UNITS;

selectStep: 'Select' OBJECTNAME 'from the list';

OBJECTNAME: ID;
TARGETNAME: ID;
ORIENTATION: 'vertical' | 'horizontal';
UNITS: ID;
NUMBER: DIGIT+;
Activate: 'Activate';
Deactivate: 'Deactivate';
Increase: 'Increase';
Decrease: 'Decrease';

ID: [a-zA-Z]+;
DIGIT: [0-9];

User Test Case:

Testcase: Check bass slider from -10 to 10

Preconditions:
Tap the settings button
Tap the tone settings button

Predict the next logical test step using the grammar rules.

The next logical test step is the following action:
- 
"""

with system():
    llm = lm + system_msg

with user():
    llm += prompt

with assistant():
    llm += teststep()

resulting in the following response:

Screenshot 02 01 2024 um 13 10 05 PM

DanielProkhorov avatar Jan 02 '24 12:01 DanielProkhorov