Luca Beurer-Kellner
Luca Beurer-Kellner
Closing this due to inactivity and the inability of the Chat API to perform partial completions. We may have some interesting alternative approaches to resolve this problem in the future,...
Definitely, given the limitations of e.g. the OpenAI API, we are very excited about the continued progress on open source models. In general, LMQL only requires a very minimal interface....
Go ahead :) Currently noone is actively working on this.
Marking this as a good first issue to work on. The place to start implementation is https://github.com/eth-sri/lmql/blob/main/src/lmql/ops/ops.py#L825. Relevant methods to override are: * `stop` to indicate the decoder whether to...
One workaround for now is to use STOPS_AT instead and to strip the suffix in code.
Yes, we definitely should change this, thanks for the suggestion and offering help. The solution we thought about is to allow strings like `@:` so e.g. `stabilityai/stablelm-tuned-alpha-7b@localhost:1234"`, parse host/port in...
The updated inference infrastructure now allows custom ports and host with `lmql serve-model`.
It would be awesome to have support for multiple streams and/or logit masking via the protocol. I think a more standardised approach going beyond llama.cpp (e.g. HuggingFace, FastChat, etc.) would...
Hi there, we provide an extensive comparison to Guidance here: https://docs.lmql.ai/en/latest/python/comparison.html. Regarding token healing, we definitely plan to add it, but we cannot give a good time window, as to...
Yes, thank you for compiling the examples. I am very aware of the issue and I agree that token healing should be added. I have been working on some things,...