Simon Mo comments

Results 313 comments of


                                            Simon Mo

fix: update layer_norm_epsilon in phi_1_5 tp layer_norm_eps

I need to fix this to get CI working. Got it working now, running some tests and will ask for review.

fix: update layer_norm_epsilon in phi_1_5 tp layer_norm_eps

I fixed the variable names, but currently facing weights naming mismatch (phi renamed some weights). I will skip this test in CI first and come back to this.

fix: update layer_norm_epsilon in phi_1_5 tp layer_norm_eps

Superseded by #2428

added support for json and regex formating

Thanks! This feature is indeed needed but we are actively evaluating Outlines as it seems to be higher performance for serving because it pre-compile all the logit masks. I'll continue...

added support for json and regex formating

Outlines integration has been added. Now the general structure is in place. We welcome PR that adapt to lm-format-enforcer backend as well.

added support for json and regex formating

Please let me know once this PR is updated, or a new PR!

API causes slowdown in batch request handling

> Of course, I can send multiple seperate requests, but those are handled sequentially and do not benefit from speed improvements. This is not correct. vLLM automatically batches in-flight requests....

API causes slowdown in batch request handling

Further illustrated here, hope the explanation is helpful: https://github.com/vllm-project/vllm/issues/1636#issuecomment-1816831493

API causes slowdown in batch request handling

Ah one more thing, if you observing sequential behavior, try correct main branch instead of released version. Or turn on the flag `--engine-use-ray`. In the released version, our AsyncLLMEngine is...

API causes slowdown in batch request handling

v0.2.2 was released last night. It should include the change. Please try it out and let us know!