Simon Mo
Simon Mo
Is `standard-supervisor` from Sagemaker? This makes our container default depends on a third party library for entrypoint, which is a bit risky.
Can this change be made in the orchestration system to override entrypoint? AFAIK K8s supports this.
What are the non-K8s based orchestratation that uses Docker container and do not offer restart? Fargate?
can you compare this against wrapping `apply_penalties` with `@torch.compile`?
My thought is mostly thinking about whether this is a kernel that torch compiler or triton can generate directly if so it reduces complexity.
https://github.com/vllm-project/vllm-openvino has been created. We will move forward with the removal.
there's a merge conflict, plz fix and we can merge this in!
Is this ready for @njhill to review?
Hi @reidliu41 thank you for this PR and sorry shout the late review. I have two high level comments - I think we should consider detach the process only after...
One more perspective is that we typically don't see users running more than one vLLM instance on single GPU/host. Therefore the number of process under management will be typically small....