Breno Faria
Breno Faria
Is there any indication of an exception in the response? There are a few places in `async_llm_engine.py` that call this method and exceptions are placed into the request stream but...
Let’s recap the issue discussion until now: It happens with different models. It happens with different GPUs. It happens with different quantization methods. It happens on high load. @prakashsanker’s hypothesis...
I'll have a look at the failing frontend test.
I can reproduce the error in the test on my dev environment. The generation does not stop when it should, generating IP addresses like this: `100.101.102.10319216`. I'm investigating why this...
I'm having a call with outlines contributors on Thursday. While there is no guarantee we will have a solution for the problem, I'd propose to wait until then. If there's...
I have opened #4558 because moving to the `Guide` API will require https://github.com/outlines-dev/outlines/issues/856 to be fixed first.
I have closed #4558 in favor of this PR. I expect to make progress on this next week. Waiting for https://github.com/outlines-dev/outlines/pull/874.
Great, thanks for the support @rlouf! Can you tell already when you plan to release?
I have removed the `FSM` import that made this particular test fail. The thing is that the underlying implementation is the same as with the `Guide` interface and the issue...
It's up for the maintainers of vLLM to decide what exactly is to be done here. We can: 1. wait for https://github.com/outlines-dev/outlines/issues/856 to be fixed and only then unpin outlines...