jupyter-ai icon indicating copy to clipboard operation
jupyter-ai copied to clipboard

Custom local LLMs

Open zboinek opened this issue 2 years ago • 37 comments
trafficstars

What about custom/private LLMs. Will there be an option to use some of longchain local features like llama.cpp?

zboinek avatar Sep 13 '23 21:09 zboinek

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

welcome[bot] avatar Sep 13 '23 21:09 welcome[bot]

I quite like the idea of GPT4ALL, but unfortunately it seems to be a mostly CPU model (2 minutes for a single response using 36 cores!) and a GPU model is far away

One fantastic idea I've seen bouncing around is to use an existing local LLM webserver that is compliant with the OpenAI API. The text-generation-webui project has actually implemented an openai-extension for a lot of their models.

I've tested it and it seems to work (5 second responses on a 12GB VRAM using their 'stable-vicuna-13B-GPTQ' model!) but commands like /generate and /learn naturally are not really implemented.

Getting it to work

text-generation-webui

First Time Install

  micromamba create -n textgen python=3.10.9
  micromamba activate textgen
  ## Nvidia gpu stuff
  pip3 install torch torchvision torchaudio
  ## WebUI
  git clone https://github.com/oobabooga/text-generation-webui
  cd text-generation-webui
  pip install -r requirements.txt
  ## OpenAI extension
  cd extensions/openai
  pip install -r requirements.txt
  cd ../../
  python server.py --extensions openai --listen
  • Go to localhost:7860 → Models Tab
  • Put https://huggingface.co/TheBloke/stable-vicuna-13B-GPTQ into the download text box
  • Wait for it to download, then kill the server.

Normal Run

  micromamba activate textgen
  cd text-generation-webui
  ## Start the server, load the model, enable the OpenAI extension
  python server.py --model TheBloke_stable-vicuna-13B-GPTQ --extensions openai --listen
  • (you should see info about the OPENAI_BASE printed here)

(optional) Test that it's reachable

micromamba activate jupyterai  ## (optional, just ensure you have all the jupyter-ai libraries)
  • In Python:
      import os
      os.environ['OPENAI_API_KEY']="sk-111111111111111111111111111111111111111111111111"
      os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"
      import openai

      response = openai.ChatCompletion.create(
        model="TheBloke_stable-vicuna-13B-GPTQ",
        messages = [{ 'role': 'system', 'content': "Answer in a consistent style." },
          {'role': 'user', 'content': "Teach me about patience."},
          {'role': 'assistant', 'content': "The river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread."},
          {'role': 'user', 'content': "Teach me about the ocean."},
        ]
      )
      text = response['choices'][0]['message']['content']
      print(text)

Jupyter AI

Run Jupyter

micromamba activate jupyterai  ## (optional, just ensure you have all the jupyter-ai libraries)
jupyter-lab
  • Click on the AI tab → Settings Wheel:

Screenshot 2023-09-18 at 15-14-06 lab - JupyterLab

  • (where API key is the sk-111111111111111111111111111111111111111111111111 from before)

After that, save and it should just work!

Jupyter AI with Stable-Vicuna

(left: NVTOP showing realtime GPU usage, right: Jupyterlab)

https://github.com/jupyterlab/jupyter-ai/assets/20641402/267ebedc-e5f0-448c-96cf-a5e57db05f97


Limitations

  • No /generate, /learn
  • Untested %%ai magic, since I use R and R does not seem to load the %% stuff.
  • There is no way to specify the model in the OpenAI settings.

Would it be possible to create a new dropdown item in Language Model called OpenAi :: Custom that would enable model selection, similar to the python example above?

As always, big thanks to the Jupyter team!

mtekman avatar Sep 18 '23 15:09 mtekman

@mtekman as per https://github.com/jupyterlab/jupyter-ai/issues/190#issuecomment-1677807605 I wonder if the proxy option could help in your use case.

krassowski avatar Sep 18 '23 18:09 krassowski

@krassowski Hi, I've been reading through the comments in a few of those threads and I guess I'm still a little bit lost on what the proxy option does, compared to the base API url?

mtekman avatar Sep 19 '23 08:09 mtekman

Hi @mtekman @zboinek @krassowski I believe we can help with this issue. I’m the maintainer of LiteLLM https://github.com/BerriAI/litellm

TLDR: We allow you to use any LLM as a drop in replacement for gpt-3.5-turbo. You can use our proxy server or spin up your own proxy server using LiteLLM

Usage

This calls the provider API directly

from litellm import completion
import os
## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-key" # 
messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# falcon call
response = completion(model="falcon-40b", messages=messages)

# ollama call
response = completion(model="ollama/llama2", messages=messages)

ishaan-jaff avatar Sep 22 '23 17:09 ishaan-jaff

@ishaan-jaff If I was to use ollama, would this then natively support /generate, /learn, /ask directives with responses that JupyterAI could understand?

Edit: I just tested ollama (though not with litellm, which appears to be a paid cloud-based model similar to OpenAI? Happy to remove this statement if I'm wrong), and it doesn't seem to work with jupyterAI

git clone [email protected]:jmorganca/ollama.git
cd ollama/ollama
./ollama serve & ./ollama run llama2
## Downloads 3 GB model and runs it at  http://localhost:11434/api

The problem is that the API offered there (which has a /generate endpoint), does not seem to be compliant with OpenAI's API, so I'm getting no responses from Jupyter.

mtekman avatar Sep 25 '23 08:09 mtekman

Ollama makes it very easy to run a variety of models locally on MacOS, Windows (via WSL, and eventually natively) and Linux. It has automatic GPU support for Apple Silicon and NVidia (it's using llama.cpp under the covers. It provides its own API and is supported by Langchain.

It would be great to have support in jupyter-ai without having to setup an API-proxy like litellm -- no judgement on that project, its just that it seems like this would be supported using the existing langchain dependency.

easp avatar Oct 25 '23 15:10 easp

Untested %%ai magic, since I use R and R does not seem to load the %% stuff.

Try below to connect to a locally hosted model (I used textgen-web-ui):

%%ai chatgpt -m {"api_base":"http://127.0.0.1:5000/v1"}

jamesjun avatar Dec 22 '23 09:12 jamesjun

With regard to @mtekman 's comment, many other providers have a common provider called OpenAI API or equivalent. It uses the same "openai" python package, with the difference that it's possible to specify the endpoint and other parameters.

For example, this is how https://continue.dev exposes these providers:

Screenshot 2024-01-19 at 12 33 10

I am one of the Collaborators of FastChat, and we have it deployed in many places. This would be an invaluable addition to Jupyter-AI.

surak avatar Jan 19 '24 11:01 surak

os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"

the default base url is http://0.0.0.0:5000 , how can i change it ?

adaaaaaa avatar Feb 03 '24 05:02 adaaaaaa

os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"

the default base url is http://0.0.0.0:5000 , how can i change it ?

That seems to be the issue of this bug report and that of #190 . You can't.

surak avatar Feb 03 '24 15:02 surak

Ollama support being tracked in #482, LangChain SelfHostedHuggingFaceLLM in #343.

astrojuanlu avatar Feb 05 '24 14:02 astrojuanlu

      os.environ['OPENAI_API_KEY']="sk-111111111111111111111111111111111111111111111111"

is the setting necessary?

adaaaaaa avatar Feb 29 '24 13:02 adaaaaaa

@adaaaaaa It seems like it https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#third-party-application-setup

mtekman avatar Feb 29 '24 16:02 mtekman

Hi @mtekman , i cannot even make the ai tab setting page show as mentioned above when I run jupyterlab in an offline environment. Is there any way to solve this?

When i click ai chat tab, it shows a warning icon and says “There seems to be a problem with the chat backend, please look at the JupyterLab server logs or contact your administrator to correct this problem.”

Jupyter Lab version: 4.1.2 Jupyter AI version: 2.18.1

imClumsyPanda avatar Jul 02 '24 11:07 imClumsyPanda

@imClumsyPanda

Weird, it works fine for me -- though I'm using a newer Jupyter Lab.

## conda or mamba or micromamba, all the same
micromamba create -y -c conda-forge -n jupyterlabai \
  jupyterlab=4.2.3 jupyter-ai=2.18.1

## Activate the environment and run it
micromamba activate jupyterlabai
jupyter-lab

The chat window tab should appear in the interface

mtekman avatar Jul 02 '24 11:07 mtekman

@imClumsyPanda

Weird, it works fine for me -- though I'm using a newer Jupyter Lab.


## conda or mamba or micromamba, all the same

micromamba create -y -c conda-forge -n jupyterlabai \

  jupyterlab=4.2.3 jupyter-ai=2.18.1



## Activate the environment and run it

micromamba activate jupyterlabai

jupyter-lab

The chat window tab should appear in the interface

@mtekman I'm not sure if it's because I'm in an offline environment.

And I installed notebook and Jupyter-ai through pip.

imClumsyPanda avatar Jul 02 '24 11:07 imClumsyPanda

@imClumsyPanda

I'm not sure if it's because I'm in an offline environment.

jupyterlab by default is run on localhost (is that what you mean by offline?)

And I installed notebook and Jupyter-ai through pip.

pip might be fighting your system python libraries depending on how your PATH is defined.

To get around this, either try creating a mamba environment as defined in my last comment, OR, create a virtualenv using just python:

 ## create a new env
virtualenv jupyteraivenv 

## source "activate" it
source jupyteraivenv/bin/activate 

## Install the right versions
pip install jupyterlab==4.2.3 jupyter-ai==2.18.1

## Run it
jupyter-lab

Double check which jupyter-lab is being called, because maybe your system has one installed globally.

whereis jupyter-lab
## should give you a path like:
## /home/blah/blah/jupyteraivenv/bin/jupyter-lab

mtekman avatar Jul 02 '24 11:07 mtekman

@mtekman I mean I'm running jupyterlab and Jupyter-ai in an environment without internet connection.

I'll try to check again tomorrow, thanks for the reply!

imClumsyPanda avatar Jul 02 '24 16:07 imClumsyPanda

@mtekman I've tried to create a new Fonda EN's and pip installed jupyterlab-4.2.3 and jupyter-ai-1.18.1 and made sure Jupyter-lab command direct to the file in newly created env.

But still I've got the same error message with an error icon says "There seems to be a problem with the chat backend, please look at JupyerLab server logs or contact your administrator to correct the problem"

And this time I've noticed that there're error messages in cmd window, which says [W 2024-07-03 10:10:10 ServerApp] 404 GET /api/ai/chats refere=None or [W 2024-07-03 10:10:10 ServerApp] 404 GET /api/ai/chats?token=[secret] refere=None.

I'll check if I can change ai chat settings through source code to solve this.

imClumsyPanda avatar Jul 03 '24 05:07 imClumsyPanda

Check your firewall (e.g. ufw disable), it could be that some internal connections are blocked?

mtekman avatar Jul 03 '24 05:07 mtekman

@imClumsyPanda most likely the server extension of jupyter-ai fails to load for some reason specific your environment (e.g. conflicting version of a dependency). You would need to look at an initial portion of the log (maybe with --debug option). Also, checking output of pip check and jupyter server extension list can be helpful.

krassowski avatar Jul 03 '24 11:07 krassowski

Hi all! I'm lost in multiple issues tracking the problem. Is there today a way to point jupyterai at a custom OpenAI-compatible API URL and specify an arbitrary model to use? (I'm running LLama3 via H2Ogpt / VLLM)

dimm0 avatar Jul 09 '24 17:07 dimm0

@dimm0 https://github.com/jupyterlab/jupyter-ai/issues/389#issuecomment-1723739865

mtekman avatar Jul 09 '24 17:07 mtekman

I saw it...

No /generate, /learn Untested %%ai magic, since I use R and R does not seem to load the %% stuff. There is no way to specify the model in the OpenAI settings. Would it be possible to create a new dropdown item in Language Model called OpenAi :: Custom that would enable model selection, similar to the python example above?

(from that post)

It seems to still be not addressed

dimm0 avatar Jul 09 '24 17:07 dimm0

You run your custom model, and then point to it via the "Base API URL", choosing an arbitrary model from the "Language Model" selection which your custom model should be API compatible with

mtekman avatar Jul 09 '24 17:07 mtekman

It keeps saying "invalid api key". I tried it with a model having no api key and the one I know API key for. But how does choosing the right model work? Will it query the list of available models from the endpoint?

dimm0 avatar Jul 09 '24 18:07 dimm0

If you're using text-generation-ui, the API key seems to be hardcoded: https://github.com/jupyterlab/jupyter-ai/issues/389#issuecomment-1971206217

But how does choosing the right model work?

My understanding of it is that you choose the OpenAI model from the dropdown that has all the endpoints you want. A little bit of trial and error is needed here, and nothing will work 100%.

Will it query the list of available models from the endpoint?

No, you literally offer a specific model at some address, and in the "Language Model" section you pick the closest OpenAI model that you think will be compatible with the endpoints for your model.

It will not consult the OpenAI servers, since you've overriden this with the "Base API url" setting

mtekman avatar Jul 09 '24 18:07 mtekman

I'm not using the test-generation-ui, I'm using https://github.com/h2oai/h2ogpt that runs llama3 via vllm for me. It exposes the standard openai-compatible interface for me on an https port, and I can connect to it from multiple openai-compatible tools. The model is meta-llama/Meta-Llama-3-70B-Instruct. I can enable an API KEY or use it without any key. How can I add one to jupyter-ai?

dimm0 avatar Jul 09 '24 18:07 dimm0

Hmm, tricky. Maybe Jupyter is expecting a specifically formatted API key? Perhaps try setting the API key in your custom model to that ridiculous sk-11111* one

mtekman avatar Jul 09 '24 18:07 mtekman