Breno Faria

Results 67 comments of Breno Faria

Is there any indication of an exception in the response? There are a few places in `async_llm_engine.py` that call this method and exceptions are placed into the request stream but...

Let’s recap the issue discussion until now: It happens with different models. It happens with different GPUs. It happens with different quantization methods. It happens on high load. @prakashsanker’s hypothesis...

I'll have a look at the failing frontend test.

I can reproduce the error in the test on my dev environment. The generation does not stop when it should, generating IP addresses like this: `100.101.102.10319216`. I'm investigating why this...

I'm having a call with outlines contributors on Thursday. While there is no guarantee we will have a solution for the problem, I'd propose to wait until then. If there's...

I have opened #4558 because moving to the `Guide` API will require https://github.com/outlines-dev/outlines/issues/856 to be fixed first.

I have closed #4558 in favor of this PR. I expect to make progress on this next week. Waiting for https://github.com/outlines-dev/outlines/pull/874.

Great, thanks for the support @rlouf! Can you tell already when you plan to release?

I have removed the `FSM` import that made this particular test fail. The thing is that the underlying implementation is the same as with the `Guide` interface and the issue...

It's up for the maintainers of vLLM to decide what exactly is to be done here. We can: 1. wait for https://github.com/outlines-dev/outlines/issues/856 to be fixed and only then unpin outlines...