server icon indicating copy to clipboard operation
server copied to clipboard

Triton Server for the model mixtral-8x7b

Open harievg opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe. I am planning to create an backend for serving triton inference engine, for the model mixtral-8x7b from the multiple .safetensor files.

I want to know does this features already exists in existing triton backends or should i create a new one for this.?

harievg avatar May 21 '24 09:05 harievg

Are you asking for safe tensors support for a specific backend? It is not currently supported but you could create your custom Python-based backend that leverages safe-tensors: https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md#

Tabrizian avatar May 23 '24 14:05 Tabrizian

.

harievg avatar May 28 '24 12:05 harievg

Hi @harievg, let's keep the discussions in this GH issue if possible. Sorry, I didn't fully understand what you're asking for.

I see two things mentioned in this GitHub issue:

  1. If there is a backend that can load models from safe tensors?
  2. How to design a scalable solution that hosts each model in a separate ip:port combination?

Did I capture the requirements correctly?

Tabrizian avatar May 28 '24 14:05 Tabrizian

Closing due to in-activity.

Tabrizian avatar Sep 06 '24 14:09 Tabrizian