gateway Triton Support

Triton Support

This branch contains the changes being done for Triton support. With these changes, portkey can now call Triton host.
The implementation is based on Ollama changes in the code.

Description: Following is the sample configuration that we need to pass in order to call the Triton server running on http://triton-host/v2/models/<model_name>

{
  "strategy": {
      "mode": "fallback"
  },
  "targets": [
      {
        "provider": "triton",
        "custom_host": "http://triton-host/v2/models/<model_name>"
        "api_key": "empty"
      },
      {
        "provider": "together-ai",
        "model": "codellama/CodeLlama-34b-Instruct-hf",
         "api_key": "together_key"
      }
  ]
}

Sample Python code:

port_key = Portkey(
      api_key="empty", config=config, base_url="http://localhost:8787/v1"
)

completion: Union[
    TextCompletion, Iterator[TextCompletionChunk]
] = port_key.completions.create(prompt=prompt, **kwargs)

output = ""
if isinstance(completion, TextCompletion):
    output = completion.choices[0].text if completion.choices[0].text else ""

Motivation:

We wanted to have the fallback mechanism in place where we should connect to our own LLM hosted in our environment, Triton in our case. In case of failover, as a second option it should contact TogetherAI. Since Triton support was not there and we saw Ollama changes being recently pushed into PortKey, we thought of leveraging custom_host feature and made these changes.
We have deployed TensorRT LLM on Triton server. With these changes we are able to achieve our requirements.

These changes do not have support for chatCompletions. We have tested our code with Text Completion use case where model repository of TensorRT-LLM backend is hosted on Triton.

Thanks!

Jul 09 '24 15:07 tapansstardog

Thank you so much for contributing, @tapansstardog! We are reviewing the PR now.

Jul 09 '24 16:07 vrushankportkey

@vrushankportkey @narengogi There was a conflict in the branch. I have resolved it and just pushed the changes. Naren, I am not sure but will you may have to approve it again to merge it into main?

Aug 16 '24 06:08 tapansstardog

Continuing this in #549 to fastrack the merge. It just resolve conflicts on top of this PR.

Aug 28 '24 14:08 VisargD

The commits from this original PR are intact. Only added one more extra merge commit.

Aug 28 '24 15:08 VisargD

gateway gateway copied to clipboard

Triton Support

gateway
gateway copied to clipboard