lmql icon indicating copy to clipboard operation
lmql copied to clipboard

Alternative LLM backend support

Open LachlanGray opened this issue 1 year ago • 1 comments

I'm really excited for this project. This could be huge for on-device LLMs. It would pair extremely nicely with projects like GPT4All, RWKV, and whatever else is on the horizon.

It would be great if there was an API for plugging in arbitrary models. What information do you need from the LLM to apply lmql?

LachlanGray avatar Apr 07 '23 14:04 LachlanGray

Definitely, given the limitations of e.g. the OpenAI API, we are very excited about the continued progress on open source models.

In general, LMQL only requires a very minimal interface. More specifically, it requires the next-token distribution (the full distribution if possible, but top-n also works). Further, it has to hook into the generation process, to apply its logit masking to enforce potential constraints. This can be optimised by speculative execution, to avoid LMQL having to run fully in the loop. However, in general, the required LMQL computations to obtain these logit masks, can be run fully in parallel to inference, so latency should not be affected too much, as long as the LMQL runtime gets to insert its mask in time.

We are very much interested in projects like Alpaca, llama.ccp and LLaMa arriving in HF Transformers, to expand support for more capable, local models.

A limited form of integration is also possible for text-only models (e.g. ChatGPT or GPT-4). There, the enforced constraints cannot be as expressive, but many simple constraints and e.g. scripting abilities are still possible. See also https://docs.lmql.ai/en/latest/language/models.html#openai-api-limitations for a bit more detail on limitations, when it comes to very restrictive APIs like OpenAI's.

lbeurerkellner avatar Apr 07 '23 14:04 lbeurerkellner