Luca Beurer-Kellner comments

Results 149 comments of


                                            Luca Beurer-Kellner

Model as a parameter

Thanks, this is a good suggestion. I just pushed a fix for (1) to 'main' (released in 0.0.5.1). I like the second proposal also. I will definitely look into it...

Model as a parameter

With the latest release you can now specify the model as `@lmql.query(model="")` or even when calling a query function via an additional keyword argument.

Comparison with Guidance?

This is a good question, LMQL and Guidance have some similarities. We agree that a deeper comparison would be useful and will add a corresponding chapter to our documentation. I...

Comparison with Guidance?

Please feel free to have a look :). From a LLM performance perspective the implementation/tooling level will not really matter though, as long as the concrete tokens/constraints you give to...

We have added a comprehensive comparison with `guidance` to the documentation. Please feel free to comment here, any additional questions that come up, so we can extend it. https://docs.lmql.ai/en/latest/python/comparison.html

[Inference Backend] Enable attention Key/value caching

As far as I understand Guidance's approach, the key idea is to only call the LLM to complete template variables and not to re-generate the entire template. This form of...

[Inference Backend] Enable attention Key/value caching

I see, thanks for the article, this clarifies it for me. I was primed on the wrong abstraction level, not thinking of transformer internals. Yes, this is definitely on the...

[Inference Backend] Enable attention Key/value caching

There is a proof-of-concept implementation on a feature branch, but to make it work with batching, padding and multi-part prompting, it still requires some work. It may be worth to...

STOPS_BEFORE on multiple tokens retains the first token

This seems to be a bug with the OpenAI API and its `"stop"` parameter. The API documentation specifies, that stopping phrases will be removed from the response, but in this...

STOPS_BEFORE on multiple tokens retains the first token

Good observation. So then we can actually fix it, we just need to make sure to not consume "tokens" beyond what the "text" return value contains. I suspect, this will...