Luca Beurer-Kellner

Results 149 comments of Luca Beurer-Kellner

Yes, this is precisely what is delaying KV caching support currently. We want to provide full batched support, but a simple non-batched variant may make it to main before then....

Yes, we definitely want to add a corresponding LMTP backend. However, we will wait until vLLM adds logit_bias support, which is crucial to make LMQL's constraining work. See also the...

Thanks for reporting this. I will have to investigate a bit, to reproduce it well on my end. As a workaround, depending on what you do, you could also just...

I can see the appeal of 1 to the user (no extra server-side setup and possibility of using third-party infrastructure) and we can definitely support it. However, 2 is the...

I am not aware of anyone actively working on this, so feel free to go ahead :)

Yes definitely. We have already planned some interesting features to support this further, like native support for function calling and tool augmentation. Would be awesome to collect any suggestions and/or...

I like the suggestion. Did you see https://github.com/eth-sri/lmql/blob/main/src/lmql/models/lmtp/lmtp_programmatic_serve_example.py which allows you to run `serve-model` from a custom launch script, which also lets you import custom modules beforehand.

I would be very open and happy about any form of collaboration. It might be worth discussing scope and/or project philosophy. In general, given the list of things you are...

Thanks for reporting this. There is an issue how we track uniquely sampled token sequences in caching. I will have a look. This also affects local models. For the omission...

> ** 1 Autonomous Execution Management** Yes, the implementation using a while loop and conditional scripting seems to cover the basic requirement. In long running execution with parallel executions of...