Luca Beurer-Kellner comments

Results 149 comments of


                                            Luca Beurer-Kellner

Feature Request: `past_key_values` landed in `transformers`, and could speed up generations

Yes, this is precisely what is delaying KV caching support currently. We want to provide full batched support, but a simple non-batched variant may make it to main before then....

Integrate vLLM

Yes, we definitely want to add a corresponding LMTP backend. However, we will wait until vLLM adds logit_bias support, which is crucial to make LMQL's constraining work. See also the...

CPU and memory leak

Thanks for reporting this. I will have to investigate a bit, to reproduce it well on my end. As a workaround, depending on what you do, you could also just...

Integrate vLLM

I can see the appeal of 1 to the user (no extra server-side setup and possibility of using third-party infrastructure) and we can definitely support it. However, 2 is the...

Integrate vLLM

I am not aware of anyone actively working on this, so feel free to go ahead :)

Yes definitely. We have already planned some interesting features to support this further, like native support for function calling and tool augmentation. Would be awesome to collect any suggestions and/or...

Additional backends aren't supported with serve-model

I like the suggestion. Did you see https://github.com/eth-sri/lmql/blob/main/src/lmql/models/lmtp/lmtp_programmatic_serve_example.py which allows you to run `serve-model` from a custom launch script, which also lets you import custom modules beforehand.

Luca Beurer-Kellner

Feature Request: `past_key_values` landed in `transformers`, and could speed up generations

Integrate vLLM

CPU and memory leak

Integrate vLLM

Integrate vLLM

LMQL as a LLM/GPT-based agent

Additional backends aren't supported with serve-model

LMQL as a LLM/GPT-based agent

support num_samples > 1 in dclib / beam_sample(n=4, temperature=1.0) returns same item n times

LMQL as a LLM/GPT-based agent