Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

[Performance]: guided generation is very slow in offline mode

Thank you for this suggestion, @Quang-elec44 - I understand that it'll be slower, but it should be marginally slower, not 20x slower. Possibly some problem in the integration?

[Performance]: guided generation is very slow in offline mode

> @stas00 is this a new issue on v0.6.0? no, same with older versions - e.g. 0.5.5 > looks like someone has just fixed the issue: Robert, I have already...

[Performance]: guided generation is very slow in offline mode

@Lap1n, I can't try v0.5.2 since it didn't support guided generated in the offline mode. But I think the problem here is something entirely different, I'm trying to dig to...

[Performance]: guided generation is very slow in offline mode

@robertgshaw2-neuralmagic, do you know why `outlines` cache doesn't get used and instead it rebuilds the same. e.g. here is one of the repro scripts I created: ``` import time from...

[Performance]: guided generation is very slow in offline mode

so I'm waiting for my colleague to give me a repro case so that I could narrow it down for him. Meanwhile, why is this pure `outlines` script so slow?...

[Performance]: guided generation is very slow in offline mode

the other mismatch thing I noticed is that vllm strips spaces between elements of json: ``` # outlines: {"name": "John", "age": 29}{"name": "John", "age": 30} # vllm: {"name":"John","age":25}{"name":"John","age":25} ``` I...

[Performance]: guided generation is very slow in offline mode

ok, I see that vllm isn't using `outlines.generate.json` generator but uses the regex engine. which has multiple issues: 1. why does it recompile the regex on every request? https://github.com/vllm-project/vllm/blob/b1f3e189586dce42bb3dcda20169a9308c9a25fa/vllm/model_executor/guided_decoding/outlines_logits_processors.py#L142 this...

Stas Bekman

[Performance]: guided generation is very slow in offline mode

[Performance]: guided generation is very slow in offline mode

[Performance]: guided generation is very slow in offline mode

[Performance]: guided generation is very slow in offline mode

[Performance]: guided generation is very slow in offline mode

[Performance]: guided generation is very slow in offline mode

[Performance]: guided generation is very slow in offline mode

[Performance]: guided generation is very slow in offline mode

stack during import tensorrt_llm

Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported.