gateway
gateway copied to clipboard
Triton Support
Triton Support
-
This branch contains the changes being done for Triton support. With these changes, portkey can now call Triton host.
-
The implementation is based on Ollama changes in the code.
Description:
Following is the sample configuration that we need to pass in order to call the Triton server running on http://triton-host/v2/models/<model_name>
{
"strategy": {
"mode": "fallback"
},
"targets": [
{
"provider": "triton",
"custom_host": "http://triton-host/v2/models/<model_name>"
"api_key": "empty"
},
{
"provider": "together-ai",
"model": "codellama/CodeLlama-34b-Instruct-hf",
"api_key": "together_key"
}
]
}
Sample Python code:
port_key = Portkey(
api_key="empty", config=config, base_url="http://localhost:8787/v1"
)
completion: Union[
TextCompletion, Iterator[TextCompletionChunk]
] = port_key.completions.create(prompt=prompt, **kwargs)
output = ""
if isinstance(completion, TextCompletion):
output = completion.choices[0].text if completion.choices[0].text else ""
Motivation:
- We wanted to have the fallback mechanism in place where we should connect to our own LLM hosted in our environment,
Tritonin our case. In case of failover, as a second option it should contactTogetherAI. Since Triton support was not there and we sawOllamachanges being recently pushed into PortKey, we thought of leveragingcustom_hostfeature and made these changes. - We have deployed TensorRT LLM on Triton server. With these changes we are able to achieve our requirements.
These changes do not have support for chatCompletions. We have tested our code with Text Completion use case where model repository of TensorRT-LLM backend is hosted on Triton.
Thanks!
Thank you so much for contributing, @tapansstardog! We are reviewing the PR now.
@vrushankportkey @narengogi There was a conflict in the branch. I have resolved it and just pushed the changes. Naren, I am not sure but will you may have to approve it again to merge it into main?
Continuing this in #549 to fastrack the merge. It just resolve conflicts on top of this PR.
The commits from this original PR are intact. Only added one more extra merge commit.