Simon Mo

Results 313 comments of Simon Mo

I need to fix this to get CI working. Got it working now, running some tests and will ask for review.

I fixed the variable names, but currently facing weights naming mismatch (phi renamed some weights). I will skip this test in CI first and come back to this.

Thanks! This feature is indeed needed but we are actively evaluating Outlines as it seems to be higher performance for serving because it pre-compile all the logit masks. I'll continue...

Outlines integration has been added. Now the general structure is in place. We welcome PR that adapt to lm-format-enforcer backend as well.

Please let me know once this PR is updated, or a new PR!

> Of course, I can send multiple seperate requests, but those are handled sequentially and do not benefit from speed improvements. This is not correct. vLLM automatically batches in-flight requests....

Further illustrated here, hope the explanation is helpful: https://github.com/vllm-project/vllm/issues/1636#issuecomment-1816831493

Ah one more thing, if you observing sequential behavior, try correct main branch instead of released version. Or turn on the flag `--engine-use-ray`. In the released version, our AsyncLLMEngine is...

v0.2.2 was released last night. It should include the change. Please try it out and let us know!