SuperAGI icon indicating copy to clipboard operation
SuperAGI copied to clipboard

Added local llm functionality by incorporating text-generation-webui

Open sirajperson opened this issue 1 year ago • 30 comments

In this PM I have integrated text-generation-webui as a means of managing locally hosted LLMs. In this PM the changes are as follows: Created a setting for OPENAI_API_BASE_URL: This allows one to set the URL that the openai library is pointed to. Created a docker image of Text Generation Web UI that includes multi-GPU off loading of GGMLs Configured SuperAGI to use use the TGWUI docker image by default.

With this PM one can run the docker-compose up --build command, and then navigate to localhost:7860 to download models for use with SuperAGI from huggingface.co.

sirajperson avatar Jun 09 '23 15:06 sirajperson

Im all about this!! I think we all wanted autogpt to run locally to push the limits without the bank! Stoked to see what we can run locally. And yeah then optimization of gpu preset, cblast, llama etc. will be next to auto optimize locall llms. Great job tho! 👏

Renegadesoffun avatar Jun 09 '23 15:06 Renegadesoffun

@sirajperson. This is awesome.

Tried running docker-compose on new changes(on macbook 16gb RAM). Getting the following error.

0 32.90 Ignoring bitsandbytes: markers 'platform_system == "Windows"' don't match your environment 0 32.90 Ignoring llama-cpp-python: markers 'platform_system == "Windows"' don't match your environment 0 32.90 Ignoring auto-gptq: markers 'platform_system == "Windows"' don't match your environment 0 32.90 ERROR: auto_gptq-0.2.0+cu117-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform.

failed to solve: executor failed running [/bin/sh -c pip3 install -r /app/requirements.txt]: exit code: 1

TransformerOptimus avatar Jun 09 '23 18:06 TransformerOptimus

@sirajperson. This is awesome.

Tried running docker-compose on new changes(on macbook 16gb RAM). Getting the following error.

0 32.90 Ignoring bitsandbytes: markers 'platform_system == "Windows"' don't match your environment

0 32.90 Ignoring llama-cpp-python: markers 'platform_system == "Windows"' don't match your environment 0 32.90 Ignoring auto-gptq: markers 'platform_system == "Windows"' don't match your environment 0 32.90 ERROR: auto_gptq-0.2.0+cu117-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform. failed to solve: executor failed running [/bin/sh -c pip3 install -r /app/requirements.txt]: exit code: 1

Thanks for the reply. As for the build, are you able to build the docker image on the main branch?

sirajperson avatar Jun 09 '23 20:06 sirajperson

Okay, it looks like there error that you are getting is while trying to execute the install requirements.txt file from text generation web ui. On line 25 of tgwui_requirements.txt try commenting out the last line. In order to make this work on your local machine there are a couple of installation steps that you may have to take using mac. I'm not sure what kind of video card you have, or if you are using a laptop, but you should be able to remove that last line from the requirements.txt file to get it installed.

Also, I have removed the following items from the launch arguments so that TGWUI doesn't automatically target devices with nvidia GPUs.

For now configuration of the docker-compose.yaml needs to be done manually. I will create an build.sh script tonight that will create the docker-compose.yaml file to load build options based on the target installation environment. Until then I have commented out GPU offloading and GPU configuration. This will make the model's API responses much slower, but will greatly increase the number of devices that the containers can run on without having to modify the docker-compose.yaml file.

Also, llama.cpp GPU offloading doesn't presently support removing offloaded layers from system RAM. Instead, it is presently making a copy of it in vRAM and then executing the layers there. This, from what I understand, is currently being addressed.

Please reclone the repository and try running from scratch. You may need to remove the containers that you already tried to build. Please refer to docker container management to get information on how to remove containers. If this is the only items that you are using containers for on your system then you can call 'docker system prune' to remove containers that aren't running and to wipe the previous build cache. Don't run prune if you have other docker images that you would like to save installed, or it will delete them.

sirajperson avatar Jun 09 '23 20:06 sirajperson

@TransformerOptimus Another hiccup I've run into in working with local LLMs is the difference in token length between llama's, and it's derivatives', token limit. The token limit of lama is 2048 while the token limit for gpt-3.5 and gpt-4 are 4096 and 8192 (for the api version) respectively. I was thinking that it might be a good idea to consider token limits in the session formation. I'll will work on a something like that later tonight, but any ideas would be greatly appreciated.

sirajperson avatar Jun 09 '23 21:06 sirajperson

@TransformerOptimus Another hiccup I've run into in working with local LLMs is the difference in token length between llama's, and it's derivatives', token limit. The token limit of lama is 2048 while the token limit for gpt-3.5 and gpt-4 are 4096 and 8192 (for the api version) respectively. I was thinking that it might be a good idea to consider token limits in the session formation. I'll will work on a something like that later tonight, but any ideas would be greatly appreciated.

There are multiple components in a prompt.

  1. Base prompt - includes goals, constraints, tools etc.
  2. Short term memory
  3. Long term memory(WIP)
  4. Knowledge base - preseeded knowledge for agents (it is WIP).

We can give certain percentage of weight to each of components. Base prompt weight can’t be changed but we can come up with variations. STM, LTM, Knowledge can have a weight of 1/3 of remaining tokens available or can be kept configurable.

TransformerOptimus avatar Jun 10 '23 01:06 TransformerOptimus

@sirajperson Hey can you let me know what to do after localhost:7860 I was able to set up locally but how to choose and test with different model? can you explain in detail what are the next steps after docker compose up --build

luciferlinx101 avatar Jun 10 '23 15:06 luciferlinx101

Can we keep the new docker-compose with local llm different from the current docker-compose file (something like - docker-compose.local_llm.yaml). We don't know how many devs want to run the local model directly by default. We can add a section in the readme for the local model.

TransformerOptimus avatar Jun 10 '23 16:06 TransformerOptimus

@TransformerOptimus Sure, It would be nice to be able to specify the use of local LLMs as a build arg. If we hand docker-compose something like --build-args use_local_llm=true than the compose executes the tgwui build.

sirajperson avatar Jun 10 '23 17:06 sirajperson

In my last commit everything seems to be basically working. I've had 4 successful runs in the past 24 hours. I'll go ahead and separate the docker-compose files. Let me know if the last commit is working well on Mac. I'm on Linux.

sirajperson avatar Jun 10 '23 17:06 sirajperson

@luciferlinx101 As of the last commit, Jun 10, to use Local LLMs follow these steps: Clone my development branch Copy the config_template.yaml to config.yaml

Edit the config.yaml file and make the following changes: Comment out line 7: OPENAI_API_BASE: https://api.openai.com/v1 Uncomment line 8: #OPENAI_API_BASE: "http://super__tgwui:5001/v1"

Modify the following lines to match the model you plan on using: MAX_TOOL_TOKEN_LIMIT: 800 MAX_MODEL_TOKEN_LIMIT: 2048 # set to 2048 for llama or 4032 for GPT-3.5

For llama based models I have successfully been using 500, and 2048 respectively

Run docker compose: docker-compose up --build

Note, that if you want to use more advanced features like loading models into GPUs then you will need to do additional configuration in the docker-compose.yaml file. I have tried to leave comments in the current file as basic instructions. For more information on specific text-generation-webui builds I would recommend that you review the instructions on the text-generation-webui-docker github repo. https://github.com/Atinoda/text-generation-webui-docker

After you have successfully built the containers, point your browser to 127.0.0.1:7860 and click on the models tab. In the feild "Download custom model or LoRA" enter the huggingface model identifier you would like to use, such as:

TheBloke/Vicuna-13B-CoT-GGML

Then click download. Then in the selection drop down menu select the model that you just downloaded and wait for it to load.

Finally, point your browser to 127.0.0.1:3000 to begin using the agent.

Cheers!

Please be aware that my fork is development branch and is undergoing a PR, and will be changing more soon. In other words, it isn't stable.

sirajperson avatar Jun 10 '23 17:06 sirajperson

In my last commit everything seems to be basically working. I've had 4 successful runs in the past 24 hours. I'll go ahead and separate the docker-compose files. Let me know if the last commit is working well on Mac. I'm on Linux.

Mac it is still failing. Getting this error. Seems running fine on ubuntu. #0 17.80
#0 17.80 note: This error originates from a subprocess, and is likely not a problem with pip. #0 17.80 ERROR: Failed building wheel for llama-cpp-python #0 17.80 Failed to build llama-cpp-python #0 17.80 ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

failed to solve: executor failed running [/bin/sh -c pip uninstall -y llama-cpp-python && CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python]: exit code: 1

TransformerOptimus avatar Jun 10 '23 17:06 TransformerOptimus

In my last commit everything seems to be basically working. I've had 4 successful runs in the past 24 hours. I'll go ahead and separate the docker-compose files. Let me know if the last commit is working well on Mac. I'm on Linux.

Mac it is still failing. Getting this error. Seems running fine on ubuntu.

#0 17.80 #0 17.80 note: This error originates from a subprocess, and is likely not a problem with pip. #0 17.80 ERROR: Failed building wheel for llama-cpp-python #0 17.80 Failed to build llama-cpp-python #0 17.80 ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects failed to solve: executor failed running [/bin/sh -c pip uninstall -y llama-cpp-python && CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python]: exit code: 1

Ran the code change on https://lmsys.org/about/ and was able to get

@TransformerOptimus Sure, It would be nice to be able to specify the use of local LLMs as a build arg. If we hand docker-compose something like --build-args use_local_llm=true than the compose executes the tgwui build.

Sounds good

TransformerOptimus avatar Jun 10 '23 17:06 TransformerOptimus

@TransformerOptimus I'm modifying the default build to remove llama-cuda.

sirajperson avatar Jun 10 '23 18:06 sirajperson

Okay I'll do a fresh clone and see if it works. I might just pull the macbook off the shelf and spin it up to debug. Although my mac is an x68 from '18. I'm not sure if I can reproduce. Let me know if commenting out the build line resolved the build issue.

sirajperson avatar Jun 10 '23 18:06 sirajperson

@sirajperson Getting error: failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::ofh3vz56pr1wh8d19h9bzjm6o: "/tgwui/scripts/docker-entrypoint.sh": not found

luciferlinx101 avatar Jun 10 '23 18:06 luciferlinx101

Okay I'll do a fresh clone and see if it works. I might just pull the macbook off the shelf and spin it up to debug. Although my mac is an x68 from '18. I'm not sure if I can reproduce. Let me know if commenting out the build line resolved the build issue.

commenting out build line resolved the earlier error(auto_gptq-0.2.0+cu117-cp310-cp310-linux_x86_64.whl issue) . New error related to llama-cpp-python.

TransformerOptimus avatar Jun 10 '23 18:06 TransformerOptimus

@sirajperson Getting error: failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::ofh3vz56pr1wh8d19h9bzjm6o: "/tgwui/scripts/docker-entrypoint.sh": not found

Cloning new repo somehow resolved it.

luciferlinx101 avatar Jun 10 '23 18:06 luciferlinx101

@luciferlinx101 Sorry about that, In the process of doing commits.

sirajperson avatar Jun 10 '23 18:06 sirajperson

@sirajperson Got error: Traceback (most recent call last): File “/app/server.py”, line 180, in download_model_wrapper model, branch = downloader.sanitize_model_and_branch_names(model, branch) File “/app/download-model.py”, line 82, in sanitize_model_and_branch_names if model[-1] == ‘/’: IndexError: string index out of range

I was able to open http://localhost:7860/ and added TheBloke/Vicuna-13B-CoT-GGML in the LoRAS(s) field and kept all other fields same as it was.

Sharing a screen shot.

image

Let me know if I am doing anything wrong.

luciferlinx101 avatar Jun 10 '23 19:06 luciferlinx101

@luciferlinx101 From the picture it looks like you have the huggingface identifier in the wrong text field. Please try cutting the text from the 'LoRA(s)' field and pasting it into the 'Download custom models or LoRA' field at the bottom of the UI.

sirajperson avatar Jun 10 '23 19:06 sirajperson

@TransformerOptimus Okay, I was able to merge the recent commits from the main branch. I was able to launch and run on CPU mode (Default), GPU mode (with distributed GGML offloading, [advanced]). Please let me know if you're running smooth on Mac. Also, give me a list of tasks you'd like to complete for merging with main. Thanks :-D

sirajperson avatar Jun 10 '23 23:06 sirajperson

@luciferlinx101 If you don't mind could you please rebuild the PR in your testing environment and let me know if everything is working correctly?

sirajperson avatar Jun 10 '23 23:06 sirajperson

@TransformerOptimus After looking back through the logs I am finding the following error. I'm looking into it, but I'm not sure if the problem exists in the MB at the moment or not. Any insight is appreciated.

celery_1 | [2023-06-10 23:38:44,948: ERROR/ForkPoolWorker-7] Task execute_agent[6015bb37-cd89-4f4f-9d75-b96ae0e4bffe] raised unexpected: DataError("Invalid input of type: 'list'. Convert to a bytes, string, int or float first.") celery_1 | Traceback (most recent call last): celery_1 | File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task celery_1 | R = retval = fun(*args, **kwargs) celery_1 | File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 734, in protected_call celery_1 | return self.run(*args, **kwargs) celery_1 | File "/usr/local/lib/python3.9/site-packages/celery/app/autoretry.py", line 54, in run celery_1 | ret = task.retry(exc=exc, **retry_kwargs) celery_1 | File "/usr/local/lib/python3.9/site-packages/celery/app/task.py", line 717, in retry celery_1 | raise_with_context(exc) celery_1 | File "/usr/local/lib/python3.9/site-packages/celery/app/autoretry.py", line 34, in run celery_1 | return task._orig_run(*args, **kwargs) celery_1 | File "/app/superagi/worker.py", line 18, in execute_agent celery_1 | AgentExecutor().execute_next_action(agent_execution_id=agent_execution_id) celery_1 | File "/app/superagi/jobs/agent_executor.py", line 149, in execute_next_action celery_1 | response = spawned_agent.execute(agent_template_step) celery_1 | File "/app/superagi/agent/super_agi.py", line 184, in execute celery_1 | task_queue.add_task(task) celery_1 | File "/app/superagi/agent/task_queue.py", line 16, in add_task celery_1 | self.db.lpush(self.queue_name, task) celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/commands/core.py", line 2706, in lpush celery_1 | return self.execute_command("LPUSH", name, *values) celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/client.py", line 1269, in execute_command celery_1 | return conn.retry.call_with_retry( celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/retry.py", line 46, in call_with_retry celery_1 | return do() celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/client.py", line 1270, in celery_1 | lambda: self._send_command_parse_response( celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/client.py", line 1245, in _send_command_parse_response celery_1 | conn.send_command(*args) celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/connection.py", line 848, in send_command celery_1 | self._command_packer.pack(*args), celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/connection.py", line 558, in pack celery_1 | for arg in map(self.encode, args): celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/connection.py", line 115, in encode celery_1 | raise DataError( celery_1 | redis.exceptions.DataError: Invalid input of type: 'list'. Convert to a bytes, string, int or float first.

sirajperson avatar Jun 10 '23 23:06 sirajperson

@TransformerOptimus With GPU mode I'm getting 6ms per token with a 7B CoT fully offloaded to GPU model. I'd call that pretty usable! :_D

sirajperson avatar Jun 10 '23 23:06 sirajperson

@sirajperson I was able to downlaod and set that model using http://localhost:7860/ and selected TheBloke/Vicuna-13B-CoT-GGML

Now howshould I use it? going to localhost:3000 doesn't seem working I removed my Open AI key and wanted to test this model Agent gets stuck in thinking but the model downloaded doesnt seems to be used.Anything wrong at my end?

luciferlinx101 avatar Jun 11 '23 03:06 luciferlinx101

--build-args Able to get the containers running but Gettting "text-generation-webui | OSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or directory" post running in mac

Let's separate the docker-compose files and update readme about local llm we are good to go for main.

TransformerOptimus avatar Jun 11 '23 04:06 TransformerOptimus

@TransformerOptimus In the last commit I'v separated the docker compose files such with the following build schema:

Default build, no local LLMs or OPENAI_API_BASE redirect. docker-compose up --build

Build with local GPU LLM support. This mode uses memory caching by default. It isn't very fast, but it works on a much greater number of host machines than using GPU mode. docker-compose -f local-llm up --build

And finally, the more advanced GPU build. This build may require that additional packages be installed on the host machine. I would suggest that anyone trying to build a GPU install of the container read the docs on the in the Text Generation Web UI doc files. They are very informative and well written. TGWUI Docs docker-compose -f local-llm-gpu up --build

Please note, that Text Generation Web UI is a large project with lots of functionality. It's worth taking time to get to know it to use local LLMs efficiently

sirajperson avatar Jun 11 '23 20:06 sirajperson

@luciferlinx101 I quite a bit of time looking for the that was causing a hang with the agent execution. Please look over the README.md file in the OpenAI plugin root folder of TGWUI. The openai plugin is currently under development and is not yet fully implemented. This PR is for integration of TGWUI as a model management tool. The issue might be more easily resolved by creating an issue on the projects issue tracker as not being able to sequence API calls to the OpenAI API plugin. The readme can found here.

sirajperson avatar Jun 11 '23 21:06 sirajperson

@sirajperson Sure I will test and let you know if it works properly.

luciferlinx101 avatar Jun 12 '23 08:06 luciferlinx101