fastertransformer_backend Does triton-inference-server only support slurm for multi-node deployment?

Does triton-inference-server only support slurm for multi-node deployment?

Open Shuai-Xie opened this issue 2 years ago • 3 comments

Dear Developers:

I'm deploying a GPT model with triton-inference-server and fastertransformer_backend, following this tutorial: https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-triton-server-on-multiple-nodes.

I have successfully implemented the single-node deployment and conducted identity testing. However, as I moved forward, I discovered that multi-node serving requires Slurm, which may be a counterpart to multi-node training.

So, my question is what is the right way to use triton-inference-server on a cluster?

Maybe KServe with triton on a k8s cluster? https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/triton/bert/README.md
or something else. (Looking forward to discussing with you sincerely)

Thanks a lot!

Apr 10 '23 03:04 Shuai-Xie

I don't know what is the right way for a cluster. You can ask in tritonserver repo.

All platform supported by tritonserver should be supported by ft backend, except we need some method to do multi-process for multi node inference, which may be not covered by tritonserver directly.

Apr 10 '23 04:04 byshiue

Thanks for your kind advice! I'll ask this question in tritonserver repo.

By the way, in a kubernetes cluster, a Pod(of containers) can only be scheduled to a single node. So I guess there maybe some efforts to ship a multi-node inference workload a cluster.

Apr 12 '23 09:04 Shuai-Xie

Thanks for your kind advice! I'll ask this question in tritonserver repo.

By the way, in a kubernetes cluster, a Pod(of containers) can only be scheduled to a single node. So I guess there maybe some efforts to ship a multi-node inference workload a cluster.

I noticed that https://github.com/triton-inference-server/server/issues/5627 was opened in tritonserver repo and as @krishung5 suggested, any multi-node or fastertransformer specific questions should be asked here.

Is it possible to add examples about how to use fastertransformer backend if tritonserver has been deployed through helm chart in Kubernetes cluster for multi-node inference?

Apr 24 '23 08:04 yeahdongcn

fastertransformer_backend fastertransformer_backend copied to clipboard

Does triton-inference-server only support slurm for multi-node deployment?

fastertransformer_backend
fastertransformer_backend copied to clipboard