Simon Mo

Results 313 comments of Simon Mo

Is `standard-supervisor` from Sagemaker? This makes our container default depends on a third party library for entrypoint, which is a bit risky.

Can this change be made in the orchestration system to override entrypoint? AFAIK K8s supports this.

What are the non-K8s based orchestratation that uses Docker container and do not offer restart? Fargate?

can you compare this against wrapping `apply_penalties` with `@torch.compile`?

My thought is mostly thinking about whether this is a kernel that torch compiler or triton can generate directly if so it reduces complexity.

https://github.com/vllm-project/vllm-openvino has been created. We will move forward with the removal.

there's a merge conflict, plz fix and we can merge this in!

Hi @reidliu41 thank you for this PR and sorry shout the late review. I have two high level comments - I think we should consider detach the process only after...

One more perspective is that we typically don't see users running more than one vLLM instance on single GPU/host. Therefore the number of process under management will be typically small....