openui icon indicating copy to clipboard operation
openui copied to clipboard

Use local API as LLM

Open vale46n1 opened this issue 1 year ago • 6 comments

Can we add a way to use a local API as llm? Python code should be:

client = OpenAI( api_key="", # Change the API base URL to the local interference API base_url="http://localhost:1337/v1" )

Would be similar to what already provided with ollama

vale46n1 avatar Apr 04 '24 20:04 vale46n1

Would like to hear more about your usecase. If you want to mess around locally, you'd just change this line. That's still going to pass gpt-3.5-turbo etc as a model name, to make this work generically we would need a uniform way to get a list of what models are available. This is essentially what I'm doing with the Ollama integration.

I've been thinking about adding support for tools like Replicated or Together.ai which would make using open source models much simpler / faster. Are you just running a lammacpp model independent of ollama?

vanpelt avatar Apr 04 '24 20:04 vanpelt

I'm using StudioLM for same test. openrouter.ai is another good and cheap alternative to be use (and I would say to be integrated)

vale46n1 avatar Apr 06 '24 18:04 vale46n1

to make this work generically we would need a uniform way to get a list of what models are available.

LMstudio server works when there is a model already uploaded, so it's not like the Ollama server can run without a model, we just need a connection, and the user can change the model from the Lmstudio or Ooba, etc, I changed the base_url value to match Lmstudio but still connect to Olama .

MMoneer avatar Apr 07 '24 08:04 MMoneer

which is the best local LLM to run openui?

grigio avatar Apr 12 '24 23:04 grigio

which is the best local LLM to run openui?

@vanpelt mentioned LLava, So try one of the V1.6 7B,13B, 34B

MMoneer avatar Apr 13 '24 05:04 MMoneer

Would like to hear more about your usecase. If you want to mess around locally, you'd just change this line. That's still going to pass gpt-3.5-turbo etc as a model name, to make this work generically we would need a uniform way to get a list of what models are available. This is essentially what I'm doing with the Ollama integration.

I've been thinking about adding support for tools like Replicated or Together.ai which would make using open source models much simpler / faster. Are you just running a lammacpp model independent of ollama?

According to LMStudio official tech doc [https://lmstudio.ai/docs/local-server]: Check which models are currently loaded

curl http://localhost:1234/v1/models

Response (following OpenAI's format)

{
  "data": [
    {
      "id": "TheBloke/phi-2-GGUF/phi-2.Q4_K_S.gguf",
      "object": "model",
      "owned_by": "organization-owner",
      "permission": [
        {}
      ]
    },
    {
      "id": "lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q4_k_m.gguf",
      "object": "model",
      "owned_by": "organization-owner",
      "permission": [
        {}
      ]
    }
  ],
  "object": "list"
}%  

In this case both TheBloke/phi-2-GGUF and lmstudio-ai/gemma-2b-it-GGUF are loaded

Highlyhotgames avatar May 08 '24 15:05 Highlyhotgames