AutoGPT
AutoGPT copied to clipboard
Add settings for custom base url and embedding dimension
Making the openai base url and embedding dimension configurable, these are useful to integrate AutoGPT with other models, like LLaMA
Background
This makes AutoGPT capable of connecting to custom openai-like APIs like [keldenl](https://github.com/keldenl/gpt-llama.cpp), and use other models, like LLaMA and derivates.
see also #25 #567 #2158
Changes
Added OPENAI_API_BASE_URL and EMBED_DIM to .env_template and loaded them in config.py, making sure OPENAI_API_BASE_URL would be ignored if USE_AZURE is True.
Also, modified the files in autogpt/memory to use the value in EMBED_DIM instead of 1536 (wich is still the default)
Documentation
I added an explanation of what those new configurations do in the .env_template file, following the comments on other configurations
Test Plan
Tested it by using gpt-llama.cpp on my machine, and setting OPENAI_API_BASE_URL to the API url in my .env file. I used Vicuna 13B, so i also set EMBED_DIM to 5120 For this test, i also set OPENAI_API_KEY to the model's path (it's an "hack" made by gpt-llama.cpp to get the model's path)
PR Quality Checklist
-
- [x] My pull request is atomic and focuses on a single change.
-
- [x] I have thoroughly tested my changes with multiple different prompts.
-
- [x] I have considered potential risks and mitigations for my changes.
-
- [x] I have documented my changes clearly and comprehensively.
-
- [x] I have not snuck in any "extra" small tweaks changes
LGTM
A name like LLM_API_BASE_URL instead of OPENAI_API_BASE_URL might be more fitting since it allows us to not always use OpenAI's API
It's still using the OpenAI API, just not their endpoint, even if the model behind it isn't an OpenAI model.
LGTM 👍
Codecov Report
Patch coverage: 60.00% and project coverage change: -8.24 :warning:
Comparison is base (
f8dfedf) 49.65% compared to head (96e7650) 41.41%.
:exclamation: Current head 96e7650 differs from pull request most recent head b7defd2. Consider uploading reports for the commit b7defd2 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #2594 +/- ##
==========================================
- Coverage 49.65% 41.41% -8.24%
==========================================
Files 64 63 -1
Lines 3021 3011 -10
Branches 505 495 -10
==========================================
- Hits 1500 1247 -253
- Misses 1401 1698 +297
+ Partials 120 66 -54
| Impacted Files | Coverage Δ | |
|---|---|---|
| autogpt/memory/milvus.py | 3.38% <ø> (ø) |
|
| autogpt/memory/pinecone.py | 28.57% <0.00%> (ø) |
|
| autogpt/config/config.py | 74.02% <33.33%> (-2.14%) |
:arrow_down: |
| autogpt/memory/redismem.py | 31.34% <66.66%> (+2.11%) |
:arrow_up: |
| autogpt/memory/local.py | 96.07% <100.00%> (+0.16%) |
:arrow_up: |
... and 17 files with indirect coverage changes
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.
Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.
Thanks so much for building this @DGdev91 and delivering the required documentation. Really awesome job!
For the ones struggling to implement this, it took me a while finding the right model for the job. Eventually I got it to work with ggml-vicuna-13b-1.1-q4_2.bin (from huggingface).
my .env.:
- OPENAI_API_BASE_URL=http://localhost:443/v1
- EMBED_DIM=5120
- OPENAI_API_KEY=M:\AI\llama.cpp\models\ggml-vicuna-13b-1.1-q4_2.bin
I do have to say, it's incredibally slow on my machine. While I have a decent processor and 32G of ram (and a geforce RTX 3070ti) and am running the model from a fast SSD, it will not utilize my full machine. It will actually timeout (600 seconds) every request unless I put the TIMEOUT_SECS = 6000 in the api_requestor.py file of autoGPT. The 7B models were a bit faster, but weren't able to respond in the way that allows autoGPT to actually work. I'm thinking of trying to get it to work with my videocard, since it is the most high end part of my pc, but am not quite sure yet where to start. Will let you know if I make it :)
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| docs | ⬜️ Ignored (Inspect) | Visit Preview | Jun 10, 2023 11:49am |
Codecov Report
Patch coverage: 34.14% and project coverage change: -1.00 :warning:
Comparison is base (
3081f56) 69.81% compared to head (0d3060e) 68.81%.
Additional details and impacted files
@@ Coverage Diff @@
## master #2594 +/- ##
==========================================
- Coverage 69.81% 68.81% -1.00%
==========================================
Files 72 72
Lines 3571 3585 +14
Branches 568 574 +6
==========================================
- Hits 2493 2467 -26
- Misses 890 927 +37
- Partials 188 191 +3
| Impacted Files | Coverage Δ | |
|---|---|---|
| autogpt/agent/agent.py | 59.88% <ø> (ø) |
|
| autogpt/speech/eleven_labs.py | 28.57% <0.00%> (ø) |
|
| autogpt/commands/audio_text.py | 31.03% <11.11%> (-5.33%) |
:arrow_down: |
| autogpt/speech/say.py | 36.66% <14.28%> (ø) |
|
| autogpt/config/config.py | 70.58% <44.44%> (-4.07%) |
:arrow_down: |
| autogpt/commands/google_search.py | 95.74% <100.00%> (ø) |
|
| autogpt/prompts/prompt.py | 46.80% <100.00%> (ø) |
|
| autogpt/speech/stream_elements_speech.py | 44.44% <100.00%> (ø) |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Sorry for such a long back and forth. Want to make sure this is abstracted just enough that we don't have to redo it and break it all
I haven't tested it AT ALL but there's some context for what I'm referring to by the requested changes in the branch: base-url-and-embeddings. Obv non working but should get the point across
I'm thinking of trying to get it to work with my videocard, since it is the most high end part of my pc, but am not quite sure yet where to start. Will let you know if I make it :)
Compile your local api provider with CUBLAS eg for llama-cpp-python
LLAMA_CUBLAS=1 pip install llama-cpp-python[server]
I'm thinking of trying to get it to work with my videocard, since it is the most high end part of my pc, but am not quite sure yet where to start. Will let you know if I make it :)
Compile your local api provider with CUBLAS eg for llama-cpp-python
LLAMA_CUBLAS=1 pip install llama-cpp-python[server]
I guess he's using keldenl's gpt-llama.cpp
but it can be applied also there. it uses ggerganov's llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp.git make LLAMA_CUBLAS=1
Of course you need to have CUDA sdk installed for doing that
I'm thinking of trying to get it to work with my videocard, since it is the most high end part of my pc, but am not quite sure yet where to start. Will let you know if I make it :)
Compile your local api provider with CUBLAS eg for llama-cpp-python
LLAMA_CUBLAS=1 pip install llama-cpp-python[server]I guess he's using keldenl's gpt-llama.cpp
but it can be applied also there. it uses ggerganov's llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp.git make LLAMA_CUBLAS=1
Of course you need to have CUDA sdk installed for doing that
Thanks both! I'm indeed using keldenl's gpt-llama.cpp currently, but will try your suggestion! I hope I can just direct the OPENAI_API_BASE_URL to the llama-cpp-python[server]. (PS: today autoGPT actually reached my 6000 second request timeout as well, so need to find a better solution xD)
Thanks both! I'm indeed using keldenl's gpt-llama.cpp currently, but will try your suggestion! I hope I can just direct the OPENAI_API_BASE_URL to the llama-cpp-python[server]. (PS: today autoGPT actually reached my 6000 second request timeout as well, so need to find a better solution xD)
don't get confused, keldenl's project uses the standard llama.cpp binary, wich is written in cpp. llama-cpp-python is a different proect (python bindings for llama.cpp)
I suggest you to run llama.ccp alone to verify it's compiled correctly and it's actually using the cpu. if it's using cuBLAS, you should see "blas=1" after it loaded the model. If you are using the same projects you were using the first time, you most likely need to run "make clean" before building it with cuBLAS support.
This PR conflicts with #3222 and is not atomic. Please fix that so we can review it.
This PR conflicts with #3222 and is not atomic. Please fix that so we can review it.
Why are you saying that? the hardcoded embedding dimension using in memory-related classes and those settings wich he's adding are different things. there are no conflicts. We also modified different files, only .env.template and config.py are in common
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.
Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.
This PR conflicts with #3222 and is not atomic. Please fix that so we can review it.
Why are you saying that? the hardcoded embedding dimension using in memory-related classes and those settings wich he's adding are different things. there are no conflicts. We also modified different files, only .env.template and config.py are in common
Sorry, I could have been more clear, see the comment above. Unrelated changes should not be submitted together, since that makes it harder to review and pick PRs that we want to process.
Can I get a test to coverage this
Sorry, I could have been more clear, see the comment above. Unrelated changes should not be submitted together, since that makes it harder to review and pick PRs that we want to process.
Those change are all about new configurations wich aim to make possible the use of different LLMs, as long as they use an API compliant to OpenAI's, so it made sense to me to put them together.
But if you prefer, i can keep this PR only for EMBED_DIM and put OPENAI_API_BASE_URL in another one.
But without the ability to modify openai.api_base (that's what OPENAI_API_BASE_URL does), we cannot test if different EMBED_DIM values work (on OpenAI's model that value is always 1536)
Can I get a test to coverage this
If you are referring to the automatic tests wich are making codeconv/patch to fail, most of the uncovered lines are from pinecone and redis integrations, wich don't have tests at all. they never had one even before my changes.
The milvus test resulted uncovered because i forced the type for the cfg argument in init_collection. made a commit wich should fix that (at least, i hope).
there's also an uncovered line in config.py because (of course) we never set openai.api_base unless we have OPENAI_API_BASE_URL set. Should i add a definition in test_config.py just fot that?
My last attempt on fixing the milvus_memory_test.py test didn't really had the desired result and CodeCov still marks it as uncovered (the test itself still runs fine). But i'm sure it actually is covered, that code is in the init method and the class is initilized in both milvus_memory_tests.py files. I guess it's because of that MockConfig object in tests/milvus_memory_test.py Isn't it better to just initialize a new Config() class like the test under the integration folder already does?
This is a mass message from the AutoGPT core team. Our apologies for the ongoing delay in processing PRs. This is because we are re-architecting the AutoGPT core!
For more details (and for infor on joining our Discord), please refer to: https://github.com/Significant-Gravitas/Auto-GPT/wiki/Architecting
This is a mass message from the AutoGPT core team. Our apologies for the ongoing delay in processing PRs. This is because we are re-architecting the AutoGPT core!
For more details (and for infor on joining our Discord), please refer to: Significant-Gravitas/Auto-GPT/wiki/Architecting
Please merge this PR to master before re-integration. CC @Significant-Gravitas, @Torantulino, @p-i-, @Pwuts
Lots of work has gone into it, it's working great in a fork, and it is a very significant upgrade to the base Auto-GPT; providing functionality which is important to the "core" of Auto-GPT going forward.
I don't think you quite understand why they aren't merging. The reason for it is the re-arch is going to invalidate all current PRs, because it is going to introduce massive breaking changes to how AutoGPT works. Also not a good idea to beg for merge IMO.
I don't think you quite understand why they aren't merging. The reason for it is the re-arch is going to invalidate all current PRs, because it is going to introduce massive breaking changes to how AutoGPT works. Also not a good idea to beg for merge IMO.
Well, in the wiki it's also written that it can be a good idea to merge before the re-integration https://github.com/Significant-Gravitas/Auto-GPT/wiki/Architecting#2-push-for-your-pr-to-be-merged-into-master-before-re-integration
But i understand that there are many changes wich are way more complex and critical than mine, and i'm perfectly ok to wait and eventually rewrite something if the mantainers require that.
Also... @ntindle asked for a test to coverage the new code. I don't really know what can be a good way to make an unit test for this, since this is meant to connect to any external openai-compliant API. It still uses all the core functions used for interacting with GPT3.5 and GPT4, is it really needed/useful?
Also not a good idea to beg for merge IMO.
I understand it's strange. But the linked Wiki article basically says to do exactly that. "2. push for your PR to be merged into master before re-integration."
It's not my PR, but it does satisfy the Issues I've been advocating for since the early days of Auto-GPT. So I'm advocating for it to be merged before re-integration, per the linked wiki instructions.