jupyter-ai
jupyter-ai copied to clipboard
Custom local LLMs
What about custom/private LLMs. Will there be an option to use some of longchain local features like llama.cpp?
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
I quite like the idea of GPT4ALL, but unfortunately it seems to be a mostly CPU model (2 minutes for a single response using 36 cores!) and a GPU model is far away
One fantastic idea I've seen bouncing around is to use an existing local LLM webserver that is compliant with the OpenAI API. The text-generation-webui project has actually implemented an openai-extension for a lot of their models.
I've tested it and it seems to work (5 second responses on a 12GB VRAM using their 'stable-vicuna-13B-GPTQ' model!) but commands like /generate and /learn naturally are not really implemented.
Getting it to work
text-generation-webui
First Time Install
micromamba create -n textgen python=3.10.9
micromamba activate textgen
## Nvidia gpu stuff
pip3 install torch torchvision torchaudio
## WebUI
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
## OpenAI extension
cd extensions/openai
pip install -r requirements.txt
cd ../../
python server.py --extensions openai --listen
- Go to localhost:7860 → Models Tab
- Put
https://huggingface.co/TheBloke/stable-vicuna-13B-GPTQinto the download text box - Wait for it to download, then kill the server.
Normal Run
micromamba activate textgen
cd text-generation-webui
## Start the server, load the model, enable the OpenAI extension
python server.py --model TheBloke_stable-vicuna-13B-GPTQ --extensions openai --listen
- (you should see info about the OPENAI_BASE printed here)
(optional) Test that it's reachable
micromamba activate jupyterai ## (optional, just ensure you have all the jupyter-ai libraries)
- In Python:
import os
os.environ['OPENAI_API_KEY']="sk-111111111111111111111111111111111111111111111111"
os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"
import openai
response = openai.ChatCompletion.create(
model="TheBloke_stable-vicuna-13B-GPTQ",
messages = [{ 'role': 'system', 'content': "Answer in a consistent style." },
{'role': 'user', 'content': "Teach me about patience."},
{'role': 'assistant', 'content': "The river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread."},
{'role': 'user', 'content': "Teach me about the ocean."},
]
)
text = response['choices'][0]['message']['content']
print(text)
Jupyter AI
Run Jupyter
micromamba activate jupyterai ## (optional, just ensure you have all the jupyter-ai libraries)
jupyter-lab
- Click on the AI tab → Settings Wheel:
- (where API key is the
sk-111111111111111111111111111111111111111111111111from before)
After that, save and it should just work!
Jupyter AI with Stable-Vicuna
(left: NVTOP showing realtime GPU usage, right: Jupyterlab)
https://github.com/jupyterlab/jupyter-ai/assets/20641402/267ebedc-e5f0-448c-96cf-a5e57db05f97
Limitations
- No
/generate,/learn - Untested
%%aimagic, since I use R and R does not seem to load the%%stuff. - There is no way to specify the model in the OpenAI settings.
Would it be possible to create a new dropdown item in Language Model called OpenAi :: Custom that would enable model selection, similar to the python example above?
As always, big thanks to the Jupyter team!
@mtekman as per https://github.com/jupyterlab/jupyter-ai/issues/190#issuecomment-1677807605 I wonder if the proxy option could help in your use case.
@krassowski Hi, I've been reading through the comments in a few of those threads and I guess I'm still a little bit lost on what the proxy option does, compared to the base API url?
Hi @mtekman @zboinek @krassowski I believe we can help with this issue. I’m the maintainer of LiteLLM https://github.com/BerriAI/litellm
TLDR:
We allow you to use any LLM as a drop in replacement for gpt-3.5-turbo.
You can use our proxy server or spin up your own proxy server using LiteLLM
Usage
This calls the provider API directly
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-key" #
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# falcon call
response = completion(model="falcon-40b", messages=messages)
# ollama call
response = completion(model="ollama/llama2", messages=messages)
@ishaan-jaff If I was to use ollama, would this then natively support /generate, /learn, /ask directives with responses that JupyterAI could understand?
Edit: I just tested ollama (though not with litellm, which appears to be a paid cloud-based model similar to OpenAI? Happy to remove this statement if I'm wrong), and it doesn't seem to work with jupyterAI
git clone [email protected]:jmorganca/ollama.git
cd ollama/ollama
./ollama serve & ./ollama run llama2
## Downloads 3 GB model and runs it at http://localhost:11434/api
The problem is that the API offered there (which has a /generate endpoint), does not seem to be compliant with OpenAI's API, so I'm getting no responses from Jupyter.
Ollama makes it very easy to run a variety of models locally on MacOS, Windows (via WSL, and eventually natively) and Linux. It has automatic GPU support for Apple Silicon and NVidia (it's using llama.cpp under the covers. It provides its own API and is supported by Langchain.
It would be great to have support in jupyter-ai without having to setup an API-proxy like litellm -- no judgement on that project, its just that it seems like this would be supported using the existing langchain dependency.
Untested
%%aimagic, since I use R and R does not seem to load the%%stuff.
Try below to connect to a locally hosted model (I used textgen-web-ui):
%%ai chatgpt -m {"api_base":"http://127.0.0.1:5000/v1"}
With regard to @mtekman 's comment, many other providers have a common provider called OpenAI API or equivalent. It uses the same "openai" python package, with the difference that it's possible to specify the endpoint and other parameters.
For example, this is how https://continue.dev exposes these providers:
I am one of the Collaborators of FastChat, and we have it deployed in many places. This would be an invaluable addition to Jupyter-AI.
os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"
the default base url is http://0.0.0.0:5000 , how can i change it ?
os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"the default base url is http://0.0.0.0:5000 , how can i change it ?
That seems to be the issue of this bug report and that of #190 . You can't.
Ollama support being tracked in #482, LangChain SelfHostedHuggingFaceLLM in #343.
os.environ['OPENAI_API_KEY']="sk-111111111111111111111111111111111111111111111111"
is the setting necessary?
@adaaaaaa It seems like it https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#third-party-application-setup
Hi @mtekman , i cannot even make the ai tab setting page show as mentioned above when I run jupyterlab in an offline environment. Is there any way to solve this?
When i click ai chat tab, it shows a warning icon and says “There seems to be a problem with the chat backend, please look at the JupyterLab server logs or contact your administrator to correct this problem.”
Jupyter Lab version: 4.1.2 Jupyter AI version: 2.18.1
@imClumsyPanda
Weird, it works fine for me -- though I'm using a newer Jupyter Lab.
## conda or mamba or micromamba, all the same
micromamba create -y -c conda-forge -n jupyterlabai \
jupyterlab=4.2.3 jupyter-ai=2.18.1
## Activate the environment and run it
micromamba activate jupyterlabai
jupyter-lab
The chat window tab should appear in the interface
@imClumsyPanda
Weird, it works fine for me -- though I'm using a newer Jupyter Lab.
## conda or mamba or micromamba, all the same micromamba create -y -c conda-forge -n jupyterlabai \ jupyterlab=4.2.3 jupyter-ai=2.18.1 ## Activate the environment and run it micromamba activate jupyterlabai jupyter-labThe chat window tab should appear in the interface
@mtekman I'm not sure if it's because I'm in an offline environment.
And I installed notebook and Jupyter-ai through pip.
@imClumsyPanda
I'm not sure if it's because I'm in an offline environment.
jupyterlab by default is run on localhost (is that what you mean by offline?)
And I installed notebook and Jupyter-ai through pip.
pip might be fighting your system python libraries depending on how your PATH is defined.
To get around this, either try creating a mamba environment as defined in my last comment, OR, create a virtualenv using just python:
## create a new env
virtualenv jupyteraivenv
## source "activate" it
source jupyteraivenv/bin/activate
## Install the right versions
pip install jupyterlab==4.2.3 jupyter-ai==2.18.1
## Run it
jupyter-lab
Double check which jupyter-lab is being called, because maybe your system has one installed globally.
whereis jupyter-lab
## should give you a path like:
## /home/blah/blah/jupyteraivenv/bin/jupyter-lab
@mtekman I mean I'm running jupyterlab and Jupyter-ai in an environment without internet connection.
I'll try to check again tomorrow, thanks for the reply!
@mtekman I've tried to create a new Fonda EN's and pip installed jupyterlab-4.2.3 and jupyter-ai-1.18.1 and made sure Jupyter-lab command direct to the file in newly created env.
But still I've got the same error message with an error icon says "There seems to be a problem with the chat backend, please look at JupyerLab server logs or contact your administrator to correct the problem"
And this time I've noticed that there're error messages in cmd window, which says [W 2024-07-03 10:10:10 ServerApp] 404 GET /api/ai/chats refere=None or [W 2024-07-03 10:10:10 ServerApp] 404 GET /api/ai/chats?token=[secret] refere=None.
I'll check if I can change ai chat settings through source code to solve this.
Check your firewall (e.g. ufw disable), it could be that some internal connections are blocked?
@imClumsyPanda most likely the server extension of jupyter-ai fails to load for some reason specific your environment (e.g. conflicting version of a dependency). You would need to look at an initial portion of the log (maybe with --debug option). Also, checking output of pip check and jupyter server extension list can be helpful.
Hi all! I'm lost in multiple issues tracking the problem. Is there today a way to point jupyterai at a custom OpenAI-compatible API URL and specify an arbitrary model to use? (I'm running LLama3 via H2Ogpt / VLLM)
@dimm0 https://github.com/jupyterlab/jupyter-ai/issues/389#issuecomment-1723739865
I saw it...
No /generate, /learn Untested %%ai magic, since I use R and R does not seem to load the %% stuff. There is no way to specify the model in the OpenAI settings. Would it be possible to create a new dropdown item in Language Model called OpenAi :: Custom that would enable model selection, similar to the python example above?
(from that post)
It seems to still be not addressed
You run your custom model, and then point to it via the "Base API URL", choosing an arbitrary model from the "Language Model" selection which your custom model should be API compatible with
It keeps saying "invalid api key". I tried it with a model having no api key and the one I know API key for. But how does choosing the right model work? Will it query the list of available models from the endpoint?
If you're using text-generation-ui, the API key seems to be hardcoded: https://github.com/jupyterlab/jupyter-ai/issues/389#issuecomment-1971206217
But how does choosing the right model work?
My understanding of it is that you choose the OpenAI model from the dropdown that has all the endpoints you want. A little bit of trial and error is needed here, and nothing will work 100%.
Will it query the list of available models from the endpoint?
No, you literally offer a specific model at some address, and in the "Language Model" section you pick the closest OpenAI model that you think will be compatible with the endpoints for your model.
It will not consult the OpenAI servers, since you've overriden this with the "Base API url" setting
I'm not using the test-generation-ui, I'm using https://github.com/h2oai/h2ogpt that runs llama3 via vllm for me. It exposes the standard openai-compatible interface for me on an https port, and I can connect to it from multiple openai-compatible tools. The model is meta-llama/Meta-Llama-3-70B-Instruct. I can enable an API KEY or use it without any key. How can I add one to jupyter-ai?
Hmm, tricky. Maybe Jupyter is expecting a specifically formatted API key? Perhaps try setting the API key in your custom model to that ridiculous sk-11111* one