R2R icon indicating copy to clipboard operation
R2R copied to clipboard

Custom openai compatible endpoint

Open riyajatar37003 opened this issue 1 year ago • 6 comments

Hi, I have custom llm and embedding deployment using triton server and also a wrapper around it which is openai compatible. how can i use this in .toml config file. I have tested it with litellm proxy server and its working.

riyajatar37003 avatar Oct 01 '24 05:10 riyajatar37003

@riyajatar37003 - You could for example use the openai provider and then point the OPENAI_BASE_URL to your custom deployment. Be sure that the model name aligns with the deployment, like openai/your-custom-model.

emrgnt-cmplxty avatar Oct 01 '24 05:10 emrgnt-cmplxty

thanks could you share any doc link. where in the .toml i need to set this

riyajatar37003 avatar Oct 01 '24 05:10 riyajatar37003

myconfig.toml

[completion]
provider = "litellm"
concurrent_request_limit = 16

  [completion.generation_config]
  model = "openai/llama3.2" #add your model name here
  temperature = 0.1
  top_p = 1
  max_tokens_to_sample = 1_024
  stream = true
  add_generation_kwargs = { }

then you do r2r serve --docker --config-path=/home/riyajatar/myconfig.toml

underlines avatar Oct 04 '24 12:10 underlines

Let's say I have a openai-like endpoint served locally under "http://localhost:8004" + it's called "custom-model". It is in line with OpenAI V1 API. How to connect it to r2r?

ArturTanona avatar Nov 28 '24 09:11 ArturTanona

I believe the correct environment variable is OPENAI_API_BASE. not OPENAI_BASE_URL.

qdrddr avatar Dec 27 '24 16:12 qdrddr

Also if you are using LiteLLM Proxy with R2R, then since it internally uses LiteLLM SDK, the name of the model in the r2r.toml config file should include openai/ + the name of how it's named in LiteLLM Proxy,

so if for instance in Proxy you have a model named openai/ollama3.3, then in r2r.toml the name of the model would be openai/openai/llama3.3 @riyajatar37003

Assuming the name of your model in LiteLLM Proxy is openai/llama3.3, and you wish to use provider = "litellm" then r2r.toml would look like this:

[completion]
provider = "litellm"
concurrent_request_limit = 64

  [completion.generation_config]
  model = "openai/openai/llama3.3"

Assuming your LiteLLM Proxy config looks like this:

proxy_config:
  litellm_settings:
      drop_params: True
  model_list:
    # At least one model must exist for the proxy to start.
    - model_name: "openai/llama3.3"
      litellm_params:
        model: "openai/llama3.3"
        api_key: fake-key
        api_base: "http://ollama.mywebsite.com:11434"

Assuming you have an ollama app running on port 11434 and accessing that ollama via OpenAI-compatible API, and you have a model llama3.3 pulled into your ollama that you can see with ollama list llama3.3.

It might be confusing, but in the r2r.toml when you see provider = "litellm" that means LiteLLM SDK, not proxy. These are two separate things. LiteLLM SDK by default would always use the native backend base URL automatically of the model's provider. The LiteLLM SDK in r2r will overwrite the provider's base URL when you specify OPENAI_API_BASE explicitly for r2r.

And the name prefix openai/ tells LiteLLM SDK which provider it is. And everything after the prefix is the actual name of the model that will be requested.

qdrddr avatar Dec 27 '24 16:12 qdrddr