nullpointer0xffff
nullpointer0xffff
Same here, I can't believe USE now punished me on leaving the "infinite confirming match"! 
+1, it seems not GPU related, I tested with A100 / V100 GPUs both have similar issue. Using line profiler, I found this [get_guided_decoding_logits_processor](https://github.com/vllm-project/vllm/blob/63575bc2e197b85ce1c911421ff30c5459e35e9c/vllm/entrypoints/openai/serving_completion.py#L96-L98) call takes 93% time
> If testing lm-format-enforcer, I highly recommend adding the latest version of it to the image, as there have been performance improvements to the JsonSchemaParser. The next version of vLLM...
> > If testing lm-format-enforcer, I highly recommend adding the latest version of it to the image, as there have been performance improvements to the JsonSchemaParser. The next version of...
@noamgat here's a profling when I use lm-format-enforcer 0.10.1. ``` /lib/python3.10/site-packages/lmformatenforcer/integrations/transformers.py Function: _build_regular_tokens_list at line 58 Line # Hits Time Per Hit % Time Line Contents ============================================================== 58 @profile 59...
+1 to support `logit_bias` and allow libraries like guidance to utilize. Though there's a workaround to use vLLM API Server to mock ChatGPT API and use guidance openAI client to...