Stas Bekman
Stas Bekman
Thank you for this suggestion, @Quang-elec44 - I understand that it'll be slower, but it should be marginally slower, not 20x slower. Possibly some problem in the integration?
> @stas00 is this a new issue on v0.6.0? no, same with older versions - e.g. 0.5.5 > looks like someone has just fixed the issue: Robert, I have already...
@Lap1n, I can't try v0.5.2 since it didn't support guided generated in the offline mode. But I think the problem here is something entirely different, I'm trying to dig to...
@robertgshaw2-neuralmagic, do you know why `outlines` cache doesn't get used and instead it rebuilds the same. e.g. here is one of the repro scripts I created: ``` import time from...
so I'm waiting for my colleague to give me a repro case so that I could narrow it down for him. Meanwhile, why is this pure `outlines` script so slow?...
the other mismatch thing I noticed is that vllm strips spaces between elements of json: ``` # outlines: {"name": "John", "age": 29}{"name": "John", "age": 30} # vllm: {"name":"John","age":25}{"name":"John","age":25} ``` I...
ok, I see that vllm isn't using `outlines.generate.json` generator but uses the regex engine. which has multiple issues: 1. why does it recompile the regex on every request? https://github.com/vllm-project/vllm/blob/b1f3e189586dce42bb3dcda20169a9308c9a25fa/vllm/model_executor/guided_decoding/outlines_logits_processors.py#L142 this...
> QQ - are you planning to open up a PR to fix this? We would definitely appreciate a contribution if you have the bandwidth The summary of things so...
I run into hanging as well. Discovered that: official docker: $ python -c "import mpi4py; print(mpi4py.__version__)" 4.0.0 my setup: $ python -c "import mpi4py; print(mpi4py.__version__)" 3.1.4 so clearly they aren't...
You're giving too little information to go about. You trained it how? I assume with ZeRO-3 And you're now trying to load the model using the deepspeed checkpoint? Unfortunately changing...