SuperAGI
SuperAGI copied to clipboard
Added local llm functionality by incorporating text-generation-webui
In this PM I have integrated text-generation-webui as a means of managing locally hosted LLMs. In this PM the changes are as follows: Created a setting for OPENAI_API_BASE_URL: This allows one to set the URL that the openai library is pointed to. Created a docker image of Text Generation Web UI that includes multi-GPU off loading of GGMLs Configured SuperAGI to use use the TGWUI docker image by default.
With this PM one can run the docker-compose up --build command, and then navigate to localhost:7860 to download models for use with SuperAGI from huggingface.co.
Im all about this!! I think we all wanted autogpt to run locally to push the limits without the bank! Stoked to see what we can run locally. And yeah then optimization of gpu preset, cblast, llama etc. will be next to auto optimize locall llms. Great job tho! 👏
@sirajperson. This is awesome.
Tried running docker-compose on new changes(on macbook 16gb RAM). Getting the following error.
0 32.90 Ignoring bitsandbytes: markers 'platform_system == "Windows"' don't match your environment 0 32.90 Ignoring llama-cpp-python: markers 'platform_system == "Windows"' don't match your environment 0 32.90 Ignoring auto-gptq: markers 'platform_system == "Windows"' don't match your environment 0 32.90 ERROR: auto_gptq-0.2.0+cu117-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform.
failed to solve: executor failed running [/bin/sh -c pip3 install -r /app/requirements.txt]: exit code: 1
@sirajperson. This is awesome.
Tried running docker-compose on new changes(on macbook 16gb RAM). Getting the following error.
0 32.90 Ignoring bitsandbytes: markers 'platform_system == "Windows"' don't match your environment
0 32.90 Ignoring llama-cpp-python: markers 'platform_system == "Windows"' don't match your environment 0 32.90 Ignoring auto-gptq: markers 'platform_system == "Windows"' don't match your environment 0 32.90 ERROR: auto_gptq-0.2.0+cu117-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform. failed to solve: executor failed running [/bin/sh -c pip3 install -r /app/requirements.txt]: exit code: 1
Thanks for the reply. As for the build, are you able to build the docker image on the main branch?
Okay, it looks like there error that you are getting is while trying to execute the install requirements.txt file from text generation web ui. On line 25 of tgwui_requirements.txt try commenting out the last line. In order to make this work on your local machine there are a couple of installation steps that you may have to take using mac. I'm not sure what kind of video card you have, or if you are using a laptop, but you should be able to remove that last line from the requirements.txt file to get it installed.
Also, I have removed the following items from the launch arguments so that TGWUI doesn't automatically target devices with nvidia GPUs.
For now configuration of the docker-compose.yaml needs to be done manually. I will create an build.sh script tonight that will create the docker-compose.yaml file to load build options based on the target installation environment. Until then I have commented out GPU offloading and GPU configuration. This will make the model's API responses much slower, but will greatly increase the number of devices that the containers can run on without having to modify the docker-compose.yaml file.
Also, llama.cpp GPU offloading doesn't presently support removing offloaded layers from system RAM. Instead, it is presently making a copy of it in vRAM and then executing the layers there. This, from what I understand, is currently being addressed.
Please reclone the repository and try running from scratch. You may need to remove the containers that you already tried to build. Please refer to docker container management to get information on how to remove containers. If this is the only items that you are using containers for on your system then you can call 'docker system prune' to remove containers that aren't running and to wipe the previous build cache. Don't run prune if you have other docker images that you would like to save installed, or it will delete them.
@TransformerOptimus Another hiccup I've run into in working with local LLMs is the difference in token length between llama's, and it's derivatives', token limit. The token limit of lama is 2048 while the token limit for gpt-3.5 and gpt-4 are 4096 and 8192 (for the api version) respectively. I was thinking that it might be a good idea to consider token limits in the session formation. I'll will work on a something like that later tonight, but any ideas would be greatly appreciated.
@TransformerOptimus Another hiccup I've run into in working with local LLMs is the difference in token length between llama's, and it's derivatives', token limit. The token limit of lama is 2048 while the token limit for gpt-3.5 and gpt-4 are 4096 and 8192 (for the api version) respectively. I was thinking that it might be a good idea to consider token limits in the session formation. I'll will work on a something like that later tonight, but any ideas would be greatly appreciated.
There are multiple components in a prompt.
- Base prompt - includes goals, constraints, tools etc.
- Short term memory
- Long term memory(WIP)
- Knowledge base - preseeded knowledge for agents (it is WIP).
We can give certain percentage of weight to each of components. Base prompt weight can’t be changed but we can come up with variations. STM, LTM, Knowledge can have a weight of 1/3 of remaining tokens available or can be kept configurable.
@sirajperson Hey can you let me know what to do after localhost:7860 I was able to set up locally but how to choose and test with different model?
can you explain in detail what are the next steps after docker compose up --build
Can we keep the new docker-compose with local llm different from the current docker-compose file (something like - docker-compose.local_llm.yaml). We don't know how many devs want to run the local model directly by default. We can add a section in the readme for the local model.
@TransformerOptimus Sure, It would be nice to be able to specify the use of local LLMs as a build arg. If we hand docker-compose something like --build-args use_local_llm=true than the compose executes the tgwui build.
In my last commit everything seems to be basically working. I've had 4 successful runs in the past 24 hours. I'll go ahead and separate the docker-compose files. Let me know if the last commit is working well on Mac. I'm on Linux.
@luciferlinx101 As of the last commit, Jun 10, to use Local LLMs follow these steps: Clone my development branch Copy the config_template.yaml to config.yaml
Edit the config.yaml file and make the following changes: Comment out line 7: OPENAI_API_BASE: https://api.openai.com/v1 Uncomment line 8: #OPENAI_API_BASE: "http://super__tgwui:5001/v1"
Modify the following lines to match the model you plan on using: MAX_TOOL_TOKEN_LIMIT: 800 MAX_MODEL_TOKEN_LIMIT: 2048 # set to 2048 for llama or 4032 for GPT-3.5
For llama based models I have successfully been using 500, and 2048 respectively
Run docker compose: docker-compose up --build
Note, that if you want to use more advanced features like loading models into GPUs then you will need to do additional configuration in the docker-compose.yaml file. I have tried to leave comments in the current file as basic instructions. For more information on specific text-generation-webui builds I would recommend that you review the instructions on the text-generation-webui-docker github repo. https://github.com/Atinoda/text-generation-webui-docker
After you have successfully built the containers, point your browser to 127.0.0.1:7860 and click on the models tab. In the feild "Download custom model or LoRA" enter the huggingface model identifier you would like to use, such as:
TheBloke/Vicuna-13B-CoT-GGML
Then click download. Then in the selection drop down menu select the model that you just downloaded and wait for it to load.
Finally, point your browser to 127.0.0.1:3000 to begin using the agent.
Cheers!
Please be aware that my fork is development branch and is undergoing a PR, and will be changing more soon. In other words, it isn't stable.
In my last commit everything seems to be basically working. I've had 4 successful runs in the past 24 hours. I'll go ahead and separate the docker-compose files. Let me know if the last commit is working well on Mac. I'm on Linux.
Mac it is still failing. Getting this error. Seems running fine on ubuntu.
#0 17.80
#0 17.80 note: This error originates from a subprocess, and is likely not a problem with pip.
#0 17.80 ERROR: Failed building wheel for llama-cpp-python
#0 17.80 Failed to build llama-cpp-python
#0 17.80 ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
failed to solve: executor failed running [/bin/sh -c pip uninstall -y llama-cpp-python && CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python]: exit code: 1
In my last commit everything seems to be basically working. I've had 4 successful runs in the past 24 hours. I'll go ahead and separate the docker-compose files. Let me know if the last commit is working well on Mac. I'm on Linux.
Mac it is still failing. Getting this error. Seems running fine on ubuntu.
#0 17.80 #0 17.80 note: This error originates from a subprocess, and is likely not a problem with pip. #0 17.80 ERROR: Failed building wheel for llama-cpp-python #0 17.80 Failed to build llama-cpp-python #0 17.80 ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects failed to solve: executor failed running [/bin/sh -c pip uninstall -y llama-cpp-python && CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python]: exit code: 1
Ran the code change on https://lmsys.org/about/ and was able to get
@TransformerOptimus Sure, It would be nice to be able to specify the use of local LLMs as a build arg. If we hand docker-compose something like --build-args use_local_llm=true than the compose executes the tgwui build.
Sounds good
@TransformerOptimus I'm modifying the default build to remove llama-cuda.
Okay I'll do a fresh clone and see if it works. I might just pull the macbook off the shelf and spin it up to debug. Although my mac is an x68 from '18. I'm not sure if I can reproduce. Let me know if commenting out the build line resolved the build issue.
@sirajperson Getting error: failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::ofh3vz56pr1wh8d19h9bzjm6o: "/tgwui/scripts/docker-entrypoint.sh": not found
Okay I'll do a fresh clone and see if it works. I might just pull the macbook off the shelf and spin it up to debug. Although my mac is an x68 from '18. I'm not sure if I can reproduce. Let me know if commenting out the build line resolved the build issue.
commenting out build line resolved the earlier error(auto_gptq-0.2.0+cu117-cp310-cp310-linux_x86_64.whl issue) . New error related to llama-cpp-python.
@sirajperson Getting error: failed to solve: failed to compute cache key: failed to calculate checksum of ref moby::ofh3vz56pr1wh8d19h9bzjm6o: "/tgwui/scripts/docker-entrypoint.sh": not found
Cloning new repo somehow resolved it.
@luciferlinx101 Sorry about that, In the process of doing commits.
@sirajperson Got error: Traceback (most recent call last): File “/app/server.py”, line 180, in download_model_wrapper model, branch = downloader.sanitize_model_and_branch_names(model, branch) File “/app/download-model.py”, line 82, in sanitize_model_and_branch_names if model[-1] == ‘/’: IndexError: string index out of range
I was able to open http://localhost:7860/ and added TheBloke/Vicuna-13B-CoT-GGML in the LoRAS(s) field and kept all other fields same as it was.
Sharing a screen shot.
Let me know if I am doing anything wrong.
@luciferlinx101 From the picture it looks like you have the huggingface identifier in the wrong text field. Please try cutting the text from the 'LoRA(s)' field and pasting it into the 'Download custom models or LoRA' field at the bottom of the UI.
@TransformerOptimus Okay, I was able to merge the recent commits from the main branch. I was able to launch and run on CPU mode (Default), GPU mode (with distributed GGML offloading, [advanced]). Please let me know if you're running smooth on Mac. Also, give me a list of tasks you'd like to complete for merging with main. Thanks :-D
@luciferlinx101 If you don't mind could you please rebuild the PR in your testing environment and let me know if everything is working correctly?
@TransformerOptimus After looking back through the logs I am finding the following error. I'm looking into it, but I'm not sure if the problem exists in the MB at the moment or not. Any insight is appreciated.
celery_1 | [2023-06-10 23:38:44,948: ERROR/ForkPoolWorker-7] Task execute_agent[6015bb37-cd89-4f4f-9d75-b96ae0e4bffe] raised unexpected: DataError("Invalid input of type: 'list'. Convert to a bytes, string, int or float first.")
celery_1 | Traceback (most recent call last):
celery_1 | File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
celery_1 | R = retval = fun(*args, **kwargs)
celery_1 | File "/usr/local/lib/python3.9/site-packages/celery/app/trace.py", line 734, in protected_call
celery_1 | return self.run(*args, **kwargs)
celery_1 | File "/usr/local/lib/python3.9/site-packages/celery/app/autoretry.py", line 54, in run
celery_1 | ret = task.retry(exc=exc, **retry_kwargs)
celery_1 | File "/usr/local/lib/python3.9/site-packages/celery/app/task.py", line 717, in retry
celery_1 | raise_with_context(exc)
celery_1 | File "/usr/local/lib/python3.9/site-packages/celery/app/autoretry.py", line 34, in run
celery_1 | return task._orig_run(*args, **kwargs)
celery_1 | File "/app/superagi/worker.py", line 18, in execute_agent
celery_1 | AgentExecutor().execute_next_action(agent_execution_id=agent_execution_id)
celery_1 | File "/app/superagi/jobs/agent_executor.py", line 149, in execute_next_action
celery_1 | response = spawned_agent.execute(agent_template_step)
celery_1 | File "/app/superagi/agent/super_agi.py", line 184, in execute
celery_1 | task_queue.add_task(task)
celery_1 | File "/app/superagi/agent/task_queue.py", line 16, in add_task
celery_1 | self.db.lpush(self.queue_name, task)
celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/commands/core.py", line 2706, in lpush
celery_1 | return self.execute_command("LPUSH", name, *values)
celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/client.py", line 1269, in execute_command
celery_1 | return conn.retry.call_with_retry(
celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/retry.py", line 46, in call_with_retry
celery_1 | return do()
celery_1 | File "/usr/local/lib/python3.9/site-packages/redis/client.py", line 1270, in
@TransformerOptimus With GPU mode I'm getting 6ms per token with a 7B CoT fully offloaded to GPU model. I'd call that pretty usable! :_D
@sirajperson I was able to downlaod and set that model using http://localhost:7860/ and selected TheBloke/Vicuna-13B-CoT-GGML
Now howshould I use it? going to localhost:3000 doesn't seem working I removed my Open AI key and wanted to test this model Agent gets stuck in thinking but the model downloaded doesnt seems to be used.Anything wrong at my end?
--build-args Able to get the containers running but Gettting "text-generation-webui | OSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or directory" post running in mac
Let's separate the docker-compose files and update readme about local llm we are good to go for main.
@TransformerOptimus In the last commit I'v separated the docker compose files such with the following build schema:
Default build, no local LLMs or OPENAI_API_BASE redirect. docker-compose up --build
Build with local GPU LLM support. This mode uses memory caching by default. It isn't very fast, but it works on a much greater number of host machines than using GPU mode. docker-compose -f local-llm up --build
And finally, the more advanced GPU build. This build may require that additional packages be installed on the host machine. I would suggest that anyone trying to build a GPU install of the container read the docs on the in the Text Generation Web UI doc files. They are very informative and well written. TGWUI Docs docker-compose -f local-llm-gpu up --build
Please note, that Text Generation Web UI is a large project with lots of functionality. It's worth taking time to get to know it to use local LLMs efficiently
@luciferlinx101 I quite a bit of time looking for the that was causing a hang with the agent execution. Please look over the README.md file in the OpenAI plugin root folder of TGWUI. The openai plugin is currently under development and is not yet fully implemented. This PR is for integration of TGWUI as a model management tool. The issue might be more easily resolved by creating an issue on the projects issue tracker as not being able to sequence API calls to the OpenAI API plugin. The readme can found here.
@sirajperson Sure I will test and let you know if it works properly.