server
server copied to clipboard
Triton Server for the model mixtral-8x7b
Is your feature request related to a problem? Please describe.
I am planning to create an backend for serving triton inference engine, for the model mixtral-8x7b from the multiple .safetensor files.
I want to know does this features already exists in existing triton backends or should i create a new one for this.?
Are you asking for safe tensors support for a specific backend? It is not currently supported but you could create your custom Python-based backend that leverages safe-tensors: https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md#
.
Hi @harievg, let's keep the discussions in this GH issue if possible. Sorry, I didn't fully understand what you're asking for.
I see two things mentioned in this GitHub issue:
- If there is a backend that can load models from safe tensors?
- How to design a scalable solution that hosts each model in a separate ip:port combination?
Did I capture the requirements correctly?
Closing due to in-activity.