jan feat: Custom OpenAI compatible server support with preserved inference parameters

feat: Custom OpenAI compatible server support with preserved inference parameters

Open qnixsynapse opened this issue 9 months ago • 4 comments

Problem Sometimes, people like me run an openai compatible server instead of running the model through Jan's nitro for better tailored to hardware compatibility and availability. The openai inference partially works when set like this:

However, Inference parameters, such as top_k, top_p, etc are not available. And some parameters such as temperature are not preserved. The model name is also incorrect as shown in the screenshot: Untitled

Success Criteria It would be better to have support for a custom openai server. The list of models available can be queried through v1/models endpoint(if available).

Apr 29 '24 04:04 qnixsynapse

hi @qnixsynapse, You can achieve the same result by modifying the model.json value, take OpenAI GPT 4 turbo for example:

{
  "sources": [
    {
      "url": "https://openai.com"
    }
  ],
  "id": "gpt-4-turbo",
  "object": "model",
  "name": "OpenAI GPT 4 Turbo",
  "version": "1.2",
  "description": "OpenAI GPT 4 Turbo model is extremely good",
  "format": "api",
  "settings": {},
  "parameters": {
    "max_tokens": 4096,
    "temperature": 0.7,
    "top_p": 0.95,
    "stream": true,
    "stop": [],
    "frequency_penalty": 0,
    "presence_penalty": 0
  },
  "metadata": {
    "author": "OpenAI",
    "tags": [
      "General"
    ]
  },
  "engine": "openai"
}

The value from the settings will be visible in the UI and will be applied to the request

Related article: https://jan.ai/docs/remote-models/generic-openai

May 15 '24 15:05 Van-QA

@Van-QA Thank you! This helped. BTW, is it possible to provide a custom openai compatible endpoint for embeddings which is needed in "Knowledge retrieval"?

May 23 '24 12:05 qnixsynapse

@Van-QA Thank you! This helped. BTW, is it possible to provide a custom openai compatible endpoint for embeddings which is needed in "Knowledge retrieval"?

hi @qnixsynapse this page will guide you to the modification of chat completion endpoint https://jan.ai/docs/remote-models/openai#how-to-integrate-openai-api-with-jan

May 23 '24 15:05 Van-QA

This is for large language model only. I use a smaller sentence transformer for the embeddings which is significantly faster than using embeddings from the main large model. So If I want to add an endpoint to the Jan app on some other port, will that be possible? Or sentence transformer support on the app using nitro is also a viable option.

May 23 '24 15:05 qnixsynapse

Closing as dupe of ^

Sep 05 '24 07:09 freelerobot

jan jan copied to clipboard

feat: Custom OpenAI compatible server support with preserved inference parameters

jan
jan copied to clipboard