gateway icon indicating copy to clipboard operation
gateway copied to clipboard

Bug Report: Self-Hosted Gateway's triton provider lacks support for embeddings and chat/completions

Open sds-ajoshi opened this issue 4 months ago • 0 comments

Environment Details:

Portkey Version: portkeyai/gateway:1.10.0 (public Docker image)

Deployment: Self-hosted in a Kubernetes environment.

Backend Service: NVIDIA Triton Inference Server

Models:

BAAI/bge-large-en-v1.5 (Embedding)

BAAI/bge-reranker-large (Reranker)

Goal: To use the self-hosted Portkey gateway as a secure proxy and management layer for our custom embedding and reranker models hosted on a Triton Inference Server.

Problem Summary: When attempting to integrate our Triton server with the self-hosted gateway, we have discovered that the built-in triton provider appears to be incomplete. It does not support requests to the standard /v1/embeddings or /v1/chat/completions endpoints, returning ...is not supported by triton errors. This prevents us from using the gateway for its intended purpose with our models.

Troubleshooting Steps Performed: We have methodically tested several configurations based on the official documentation, all of which have failed:

Attempt 1: Using provider: openai with custom_host

Configuration: We configured our Triton models with provider: "openai" and used custom_host to point to our internal Triton service URL.

Result: This failed with an invalid_api_key error, indicating Portkey was trying to authenticate with our Triton server as if it were the public OpenAI API.

Attempt 2: Using Custom Provider Names

Configuration: We defined custom providers (e.g., triton-embedding-provider) in the portkey-config.yaml and passed them in the x-portkey-provider header.

Result: This failed with a Not Found error. The gateway did not recognize the generic /infer path, proving it does not act as a simple transparent proxy.

Attempt 3: Using the Official triton Provider (Final Attempt)

Configuration: We configured our models using provider: "triton" in the portkey-config.yaml and passed x-portkey-provider: triton in the request header, as suggested by the documentation.

Result: This produced the most direct errors:

For a request to /v1/embeddings: {"status":"failure","message":"embed is not supported by triton"}

For a request to /v1/chat/completions: {"status":"failure","message":"chatComplete is not supported by triton"}

Conclusion: The evidence strongly suggests that the triton provider logic within the portkeyai/gateway:1.10.0 image does not have the necessary implementation to handle requests for the standard embedding and chat completion APIs.

Suggested Action / Feature Request: Could you please implement the embed and chatComplete functions for the triton provider in the self-hosted gateway? Alternatively, please provide a working example in the documentation for how to configure the self-hosted gateway to proxy requests to custom Triton embedding and reranker models.

Thank you for your help.

sds-ajoshi avatar Jun 27 '25 10:06 sds-ajoshi