fastertransformer_backend icon indicating copy to clipboard operation
fastertransformer_backend copied to clipboard

Does triton-inference-server only support slurm for multi-node deployment?

Open Shuai-Xie opened this issue 1 year ago • 3 comments

Dear Developers:

I'm deploying a GPT model with triton-inference-server and fastertransformer_backend, following this tutorial: https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-triton-server-on-multiple-nodes.

I have successfully implemented the single-node deployment and conducted identity testing. However, as I moved forward, I discovered that multi-node serving requires Slurm, which may be a counterpart to multi-node training.

So, my question is what is the right way to use triton-inference-server on a cluster?

  • Maybe KServe with triton on a k8s cluster? https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/triton/bert/README.md
  • or something else. (Looking forward to discussing with you sincerely)

Thanks a lot!

Shuai-Xie avatar Apr 10 '23 03:04 Shuai-Xie

I don't know what is the right way for a cluster. You can ask in tritonserver repo.

All platform supported by tritonserver should be supported by ft backend, except we need some method to do multi-process for multi node inference, which may be not covered by tritonserver directly.

byshiue avatar Apr 10 '23 04:04 byshiue

Thanks for your kind advice! I'll ask this question in tritonserver repo.

By the way, in a kubernetes cluster, a Pod(of containers) can only be scheduled to a single node. So I guess there maybe some efforts to ship a multi-node inference workload a cluster.

Shuai-Xie avatar Apr 12 '23 09:04 Shuai-Xie

Thanks for your kind advice! I'll ask this question in tritonserver repo.

By the way, in a kubernetes cluster, a Pod(of containers) can only be scheduled to a single node. So I guess there maybe some efforts to ship a multi-node inference workload a cluster.

I noticed that https://github.com/triton-inference-server/server/issues/5627 was opened in tritonserver repo and as @krishung5 suggested, any multi-node or fastertransformer specific questions should be asked here.

Is it possible to add examples about how to use fastertransformer backend if tritonserver has been deployed through helm chart in Kubernetes cluster for multi-node inference?

yeahdongcn avatar Apr 24 '23 08:04 yeahdongcn