fastertransformer_backend
fastertransformer_backend copied to clipboard
Does triton-inference-server only support slurm for multi-node deployment?
Dear Developers:
I'm deploying a GPT model with triton-inference-server and fastertransformer_backend, following this tutorial: https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-triton-server-on-multiple-nodes.
I have successfully implemented the single-node deployment and conducted identity testing. However, as I moved forward, I discovered that multi-node serving requires Slurm, which may be a counterpart to multi-node training.
So, my question is what is the right way to use triton-inference-server on a cluster?
- Maybe KServe with triton on a k8s cluster? https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/triton/bert/README.md
- or something else. (Looking forward to discussing with you sincerely)
Thanks a lot!
I don't know what is the right way for a cluster. You can ask in tritonserver repo.
All platform supported by tritonserver should be supported by ft backend, except we need some method to do multi-process for multi node inference, which may be not covered by tritonserver directly.
Thanks for your kind advice! I'll ask this question in tritonserver repo.
By the way, in a kubernetes cluster, a Pod(of containers) can only be scheduled to a single node. So I guess there maybe some efforts to ship a multi-node inference workload a cluster.
Thanks for your kind advice! I'll ask this question in tritonserver repo.
By the way, in a kubernetes cluster, a Pod(of containers) can only be scheduled to a single node. So I guess there maybe some efforts to ship a multi-node inference workload a cluster.
I noticed that https://github.com/triton-inference-server/server/issues/5627 was opened in tritonserver repo and as @krishung5 suggested, any multi-node or fastertransformer specific questions should be asked here.
Is it possible to add examples about how to use fastertransformer backend if tritonserver has been deployed through helm chart in Kubernetes cluster for multi-node inference?