R2R
R2R copied to clipboard
Custom openai compatible endpoint
Hi, I have custom llm and embedding deployment using triton server and also a wrapper around it which is openai compatible. how can i use this in .toml config file. I have tested it with litellm proxy server and its working.
@riyajatar37003 - You could for example use the openai provider and then point the OPENAI_BASE_URL to your custom deployment. Be sure that the model name aligns with the deployment, like openai/your-custom-model.
thanks could you share any doc link. where in the .toml i need to set this
myconfig.toml
[completion]
provider = "litellm"
concurrent_request_limit = 16
[completion.generation_config]
model = "openai/llama3.2" #add your model name here
temperature = 0.1
top_p = 1
max_tokens_to_sample = 1_024
stream = true
add_generation_kwargs = { }
then you do
r2r serve --docker --config-path=/home/riyajatar/myconfig.toml
Let's say I have a openai-like endpoint served locally under "http://localhost:8004" + it's called "custom-model". It is in line with OpenAI V1 API. How to connect it to r2r?
I believe the correct environment variable is OPENAI_API_BASE.
not OPENAI_BASE_URL.
Also if you are using LiteLLM Proxy with R2R, then since it internally uses LiteLLM SDK, the name of the model in the r2r.toml config file should include openai/ + the name of how it's named in LiteLLM Proxy,
so if for instance in Proxy you have a model named openai/ollama3.3, then in r2r.toml the name of the model would be openai/openai/llama3.3 @riyajatar37003
Assuming the name of your model in LiteLLM Proxy is openai/llama3.3, and you wish to use provider = "litellm" then r2r.toml would look like this:
[completion]
provider = "litellm"
concurrent_request_limit = 64
[completion.generation_config]
model = "openai/openai/llama3.3"
Assuming your LiteLLM Proxy config looks like this:
proxy_config:
litellm_settings:
drop_params: True
model_list:
# At least one model must exist for the proxy to start.
- model_name: "openai/llama3.3"
litellm_params:
model: "openai/llama3.3"
api_key: fake-key
api_base: "http://ollama.mywebsite.com:11434"
Assuming you have an ollama app running on port 11434 and accessing that ollama via OpenAI-compatible API, and you have a model llama3.3 pulled into your ollama that you can see with ollama list llama3.3.
It might be confusing, but in the r2r.toml when you see provider = "litellm" that means LiteLLM SDK, not proxy.
These are two separate things. LiteLLM SDK by default would always use the native backend base URL automatically of the model's provider.
The LiteLLM SDK in r2r will overwrite the provider's base URL when you specify OPENAI_API_BASE explicitly for r2r.
And the name prefix openai/ tells LiteLLM SDK which provider it is. And everything after the prefix is the actual name of the model that will be requested.