Simon Mo comments

Results 313 comments of


                                            Simon Mo

[Misc] Add In-Container restart capability through supervisord for openai server

Is `standard-supervisor` from Sagemaker? This makes our container default depends on a third party library for entrypoint, which is a bit risky.

[Misc] Add In-Container restart capability through supervisord for openai server

Can this change be made in the orchestration system to override entrypoint? AFAIK K8s supports this.

[Misc] Add In-Container restart capability through supervisord for openai server

What are the non-K8s based orchestratation that uses Docker container and do not offer restart? Fargate?

[KERNEL] Sampler. CUDA kernel for applying repetition penalty

can you compare this against wrapping `apply_penalties` with `@torch.compile`?

[KERNEL] Sampler. CUDA kernel for applying repetition penalty

My thought is mostly thinking about whether this is a kernel that torch compiler or triton can generate directly if so it reduces complexity.

[RFC]: Drop Support for OpenVINO

https://github.com/vllm-project/vllm-openvino has been created. We will move forward with the removal.

[Misc] Separate total and output tokens in benchmark_throughput.py

there's a merge conflict, plz fix and we can merge this in!

[V1] Support MP Executor for multi node distributed inference

Is this ready for @njhill to review?

[Frontend] Add -d/--detach option for vllm serve and process management

Hi @reidliu41 thank you for this PR and sorry shout the late review. I have two high level comments - I think we should consider detach the process only after...

[Frontend] Add -d/--detach option for vllm serve and process management

One more perspective is that we typically don't see users running more than one vLLM instance on single GPU/host. Therefore the number of process under management will be typically small....