Breno Faria
Breno Faria
@simon-mo that’s the regression described in https://github.com/outlines-dev/outlines/issues/856. This is a bug in outlines. We can either disable the test, or wait for the fix, as I listed above.
I have identified the source of the problem in outlines and am waiting for a PR (https://github.com/outlines-dev/outlines/pull/874) to get merged with the fix. Once this is done, we can move...
I like this idea. And I agree with @mmoskal that it would be important to support the more involved API being worked on in #4775. I wonder though how one...
I think this is a blocker for removing the Rest-Highlevel-Client in 3.0. Everyone using a plugin that defines custom queries will be unable to upgrade.
Outlines' reference implementation of the vLLM server (https://github.com/outlines-dev/outlines/blob/main/outlines/serve/serve.py) is a copy of vLLM's https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/api_server.py with a few patches and add-ons. I believe this code should rather live in vLLM instead...
@viktor-ferenczi, fair enough. @zhuohan123 and @rlouf, what is your assessment?
@hmellor I have found the issue and opened a PR that fixes it. Let me know if there are any open questions.
I'm sorry, I don't understand your point. Several open source models are trained on system/assistant/user message tuples and expect "correctly" formatted contexts with these messages. DSPy puts everything into one...
What about being able to force the type -- no need to implement dangerous heuristics, leave the responsibility to the user. The only thing to implement would be to be...
@saattrupdan thanks for pointing this out. I believe this fix will not work in vLLM. The thing is that the logits processors are cached here: https://github.com/vllm-project/vllm/blob/594392d27a0dc3b1df84246afb46cc229946c0f3/vllm/model_executor/guided_decoding/outlines_decoding.py#L119 Not resetting the state...