lmql icon indicating copy to clipboard operation
lmql copied to clipboard

Text generation via state machine transitions paradigm

Open aretrace opened this issue 1 year ago • 1 comments

@lbeurerkellner I think this paper and subsequent reference implementation would be of much interest to the LMQL project. Perhaps a comparison (like one done in Section 3.2) could be done.

aretrace avatar Aug 16 '23 22:08 aretrace

Thanks for linking, we have indeed already looked at this paper and investigated its applicability. The graph in 3.2 is impressive, but unfortunately we could not yet reproduce this result with the library, as the code shown does not seem to work as-is with their 0.0.8, yields different outputs and runtimes, and does not seem to match the regex correctly wrt. spaces. It is also unclear how the result scales to larger models, as the indicated decoding speed yields a rate of ~100 tok/s (for the small gpt2), which is very uncommon with realistically sized models (e.g. 7B+). It could be that the reduced overhead of constraining is entirely consumed by the much higher latency, to be expected with large models. However, I also could not get the library running with e.g. huggyllama/llama-7b, which OOMs on hardware that otherwise does run Llama-7b without issues. So for a comparison I am still waiting for better model support, as on CPU-only it actually seems to be 2x slower than LMQL's Regex preview.

Overall, the idea of pre-computing FSMs is interesting and we are investigating whether this approach is flexible enough for LMQL. The constraint language of LMQL is not limited to parsers or regex, but also allows you to implement and call external validation logic (e.g. code assistant, retrieval, etc), which precludes pre-computing all masks (we already cache them). Still, we are actively working on improved methods that enable decoding speeds much higher than currently, so I remain optimistic about future improvements about constraint decoding performance.

lbeurerkellner avatar Aug 17 '23 18:08 lbeurerkellner