AutoGPT Add settings for custom base url and embedding dimension

trafficstars

Making the openai base url and embedding dimension configurable, these are useful to integrate AutoGPT with other models, like LLaMA

Background

This makes AutoGPT capable of connecting to custom openai-like APIs like [keldenl](https://github.com/keldenl/gpt-llama.cpp), and use other models, like LLaMA and derivates.

Changes

Added OPENAI_API_BASE_URL and EMBED_DIM to .env_template and loaded them in config.py, making sure OPENAI_API_BASE_URL would be ignored if USE_AZURE is True.

Also, modified the files in autogpt/memory to use the value in EMBED_DIM instead of 1536 (wich is still the default)

Documentation

I added an explanation of what those new configurations do in the .env_template file, following the comments on other configurations

Test Plan

Tested it by using gpt-llama.cpp on my machine, and setting OPENAI_API_BASE_URL to the API url in my .env file. I used Vicuna 13B, so i also set EMBED_DIM to 5120 For this test, i also set OPENAI_API_KEY to the model's path (it's an "hack" made by gpt-llama.cpp to get the model's path)

PR Quality Checklist

- [x] My pull request is atomic and focuses on a single change.
- [x] I have thoroughly tested my changes with multiple different prompts.
- [x] I have considered potential risks and mitigations for my changes.
- [x] I have documented my changes clearly and comprehensively.
- [x] I have not snuck in any "extra" small tweaks changes

Apr 19 '23 21:04 DGdev91

LGTM

Apr 19 '23 22:04 MarkSchmidty

A name like LLM_API_BASE_URL instead of OPENAI_API_BASE_URL might be more fitting since it allows us to not always use OpenAI's API

Apr 20 '23 07:04 Sorann753

It's still using the OpenAI API, just not their endpoint, even if the model behind it isn't an OpenAI model.

Apr 20 '23 07:04 MarkSchmidty

LGTM 👍

Apr 21 '23 22:04 cryptocake

Codecov Report

Patch coverage: 60.00% and project coverage change: -8.24 :warning:

Comparison is base (f8dfedf) 49.65% compared to head (96e7650) 41.41%.

:exclamation: Current head 96e7650 differs from pull request most recent head b7defd2. Consider uploading reports for the commit b7defd2 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2594      +/-   ##
==========================================
- Coverage   49.65%   41.41%   -8.24%     
==========================================
  Files          64       63       -1     
  Lines        3021     3011      -10     
  Branches      505      495      -10     
==========================================
- Hits         1500     1247     -253     
- Misses       1401     1698     +297     
+ Partials      120       66      -54

Impacted Files	Coverage Δ
autogpt/memory/milvus.py	`3.38% <ø> (ø)`
autogpt/memory/pinecone.py	`28.57% <0.00%> (ø)`
autogpt/config/config.py	`74.02% <33.33%> (-2.14%)`	:arrow_down:
autogpt/memory/redismem.py	`31.34% <66.66%> (+2.11%)`	:arrow_up:
autogpt/memory/local.py	`96.07% <100.00%> (+0.16%)`	:arrow_up:

... and 17 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Apr 24 '23 05:04 codecov[bot]

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Apr 25 '23 18:04 github-actions[bot]

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

Apr 25 '23 18:04 github-actions[bot]

Thanks so much for building this @DGdev91 and delivering the required documentation. Really awesome job!

For the ones struggling to implement this, it took me a while finding the right model for the job. Eventually I got it to work with ggml-vicuna-13b-1.1-q4_2.bin (from huggingface).

my .env.:

OPENAI_API_BASE_URL=http://localhost:443/v1
EMBED_DIM=5120
OPENAI_API_KEY=M:\AI\llama.cpp\models\ggml-vicuna-13b-1.1-q4_2.bin

I do have to say, it's incredibally slow on my machine. While I have a decent processor and 32G of ram (and a geforce RTX 3070ti) and am running the model from a fast SSD, it will not utilize my full machine. It will actually timeout (600 seconds) every request unless I put the TIMEOUT_SECS = 6000 in the api_requestor.py file of autoGPT. The 7B models were a bit faster, but weren't able to respond in the way that allows autoGPT to actually work. I'm thinking of trying to get it to work with my videocard, since it is the most high end part of my pc, but am not quite sure yet where to start. Will let you know if I make it :)

Apr 27 '23 07:04 jvgriethuijsen

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
docs	⬜️ Ignored (Inspect)	Visit Preview		Jun 10, 2023 11:49am

Apr 27 '23 08:04 vercel[bot]

Codecov Report

Patch coverage: 34.14% and project coverage change: -1.00 :warning:

Comparison is base (3081f56) 69.81% compared to head (0d3060e) 68.81%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2594      +/-   ##
==========================================
- Coverage   69.81%   68.81%   -1.00%     
==========================================
  Files          72       72              
  Lines        3571     3585      +14     
  Branches      568      574       +6     
==========================================
- Hits         2493     2467      -26     
- Misses        890      927      +37     
- Partials      188      191       +3

Impacted Files	Coverage Δ
autogpt/agent/agent.py	`59.88% <ø> (ø)`
autogpt/speech/eleven_labs.py	`28.57% <0.00%> (ø)`
autogpt/commands/audio_text.py	`31.03% <11.11%> (-5.33%)`	:arrow_down:
autogpt/speech/say.py	`36.66% <14.28%> (ø)`
autogpt/config/config.py	`70.58% <44.44%> (-4.07%)`	:arrow_down:
autogpt/commands/google_search.py	`95.74% <100.00%> (ø)`
autogpt/prompts/prompt.py	`46.80% <100.00%> (ø)`
autogpt/speech/stream_elements_speech.py	`44.44% <100.00%> (ø)`

... and 5 files with indirect coverage changes

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Apr 27 '23 08:04 codecov[bot]

Sorry for such a long back and forth. Want to make sure this is abstracted just enough that we don't have to redo it and break it all

Apr 27 '23 08:04 ntindle

I haven't tested it AT ALL but there's some context for what I'm referring to by the requested changes in the branch: base-url-and-embeddings. Obv non working but should get the point across

Apr 27 '23 08:04 ntindle

I'm thinking of trying to get it to work with my videocard, since it is the most high end part of my pc, but am not quite sure yet where to start. Will let you know if I make it :)

Compile your local api provider with CUBLAS eg for llama-cpp-python LLAMA_CUBLAS=1 pip install llama-cpp-python[server]

Apr 27 '23 08:04 jmtatsch

I'm thinking of trying to get it to work with my videocard, since it is the most high end part of my pc, but am not quite sure yet where to start. Will let you know if I make it :)

Compile your local api provider with CUBLAS eg for llama-cpp-python LLAMA_CUBLAS=1 pip install llama-cpp-python[server]

I guess he's using keldenl's gpt-llama.cpp

but it can be applied also there. it uses ggerganov's llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp.git make LLAMA_CUBLAS=1

Of course you need to have CUDA sdk installed for doing that

Apr 27 '23 09:04 DGdev91

I'm thinking of trying to get it to work with my videocard, since it is the most high end part of my pc, but am not quite sure yet where to start. Will let you know if I make it :)

Compile your local api provider with CUBLAS eg for llama-cpp-python LLAMA_CUBLAS=1 pip install llama-cpp-python[server]

I guess he's using keldenl's gpt-llama.cpp

but it can be applied also there. it uses ggerganov's llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp.git make LLAMA_CUBLAS=1

Of course you need to have CUDA sdk installed for doing that

Thanks both! I'm indeed using keldenl's gpt-llama.cpp currently, but will try your suggestion! I hope I can just direct the OPENAI_API_BASE_URL to the llama-cpp-python[server]. (PS: today autoGPT actually reached my 6000 second request timeout as well, so need to find a better solution xD)

Apr 27 '23 10:04 jvgriethuijsen

Thanks both! I'm indeed using keldenl's gpt-llama.cpp currently, but will try your suggestion! I hope I can just direct the OPENAI_API_BASE_URL to the llama-cpp-python[server]. (PS: today autoGPT actually reached my 6000 second request timeout as well, so need to find a better solution xD)

don't get confused, keldenl's project uses the standard llama.cpp binary, wich is written in cpp. llama-cpp-python is a different proect (python bindings for llama.cpp)

I suggest you to run llama.ccp alone to verify it's compiled correctly and it's actually using the cpu. if it's using cuBLAS, you should see "blas=1" after it loaded the model. If you are using the same projects you were using the first time, you most likely need to run "make clean" before building it with cuBLAS support.

Apr 27 '23 11:04 DGdev91

This PR conflicts with #3222 and is not atomic. Please fix that so we can review it.

Apr 27 '23 14:04 Pwuts

This PR conflicts with #3222 and is not atomic. Please fix that so we can review it.

Why are you saying that? the hardcoded embedding dimension using in memory-related classes and those settings wich he's adding are different things. there are no conflicts. We also modified different files, only .env.template and config.py are in common

Apr 27 '23 18:04 DGdev91

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Apr 27 '23 19:04 github-actions[bot]

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

Apr 27 '23 19:04 github-actions[bot]

This PR conflicts with #3222 and is not atomic. Please fix that so we can review it.

Why are you saying that? the hardcoded embedding dimension using in memory-related classes and those settings wich he's adding are different things. there are no conflicts. We also modified different files, only .env.template and config.py are in common

Sorry, I could have been more clear, see the comment above. Unrelated changes should not be submitted together, since that makes it harder to review and pick PRs that we want to process.

Apr 27 '23 20:04 Pwuts

Can I get a test to coverage this

Apr 27 '23 22:04 ntindle

Sorry, I could have been more clear, see the comment above. Unrelated changes should not be submitted together, since that makes it harder to review and pick PRs that we want to process.

Those change are all about new configurations wich aim to make possible the use of different LLMs, as long as they use an API compliant to OpenAI's, so it made sense to me to put them together.

But if you prefer, i can keep this PR only for EMBED_DIM and put OPENAI_API_BASE_URL in another one.

But without the ability to modify openai.api_base (that's what OPENAI_API_BASE_URL does), we cannot test if different EMBED_DIM values work (on OpenAI's model that value is always 1536)

Apr 28 '23 01:04 DGdev91

Can I get a test to coverage this

If you are referring to the automatic tests wich are making codeconv/patch to fail, most of the uncovered lines are from pinecone and redis integrations, wich don't have tests at all. they never had one even before my changes.

The milvus test resulted uncovered because i forced the type for the cfg argument in init_collection. made a commit wich should fix that (at least, i hope).

there's also an uncovered line in config.py because (of course) we never set openai.api_base unless we have OPENAI_API_BASE_URL set. Should i add a definition in test_config.py just fot that?

Apr 30 '23 02:04 DGdev91

My last attempt on fixing the milvus_memory_test.py test didn't really had the desired result and CodeCov still marks it as uncovered (the test itself still runs fine). But i'm sure it actually is covered, that code is in the init method and the class is initilized in both milvus_memory_tests.py files. I guess it's because of that MockConfig object in tests/milvus_memory_test.py Isn't it better to just initialize a new Config() class like the test under the integration folder already does?

May 01 '23 23:05 DGdev91

This is a mass message from the AutoGPT core team. Our apologies for the ongoing delay in processing PRs. This is because we are re-architecting the AutoGPT core!

For more details (and for infor on joining our Discord), please refer to: https://github.com/Significant-Gravitas/Auto-GPT/wiki/Architecting

May 05 '23 00:05 p-i-

This is a mass message from the AutoGPT core team. Our apologies for the ongoing delay in processing PRs. This is because we are re-architecting the AutoGPT core!

For more details (and for infor on joining our Discord), please refer to: Significant-Gravitas/Auto-GPT/wiki/Architecting

Please merge this PR to master before re-integration. CC @Significant-Gravitas, @Torantulino, @p-i-, @Pwuts

Lots of work has gone into it, it's working great in a fork, and it is a very significant upgrade to the base Auto-GPT; providing functionality which is important to the "core" of Auto-GPT going forward.

May 05 '23 19:05 MarkSchmidty

I don't think you quite understand why they aren't merging. The reason for it is the re-arch is going to invalidate all current PRs, because it is going to introduce massive breaking changes to how AutoGPT works. Also not a good idea to beg for merge IMO.

May 05 '23 19:05 anonhostpi

I don't think you quite understand why they aren't merging. The reason for it is the re-arch is going to invalidate all current PRs, because it is going to introduce massive breaking changes to how AutoGPT works. Also not a good idea to beg for merge IMO.

Well, in the wiki it's also written that it can be a good idea to merge before the re-integration https://github.com/Significant-Gravitas/Auto-GPT/wiki/Architecting#2-push-for-your-pr-to-be-merged-into-master-before-re-integration

But i understand that there are many changes wich are way more complex and critical than mine, and i'm perfectly ok to wait and eventually rewrite something if the mantainers require that.

Also... @ntindle asked for a test to coverage the new code. I don't really know what can be a good way to make an unit test for this, since this is meant to connect to any external openai-compliant API. It still uses all the core functions used for interacting with GPT3.5 and GPT4, is it really needed/useful?

May 05 '23 20:05 DGdev91

Also not a good idea to beg for merge IMO.

I understand it's strange. But the linked Wiki article basically says to do exactly that. "2. push for your PR to be merged into master before re-integration."

It's not my PR, but it does satisfy the Issues I've been advocating for since the early days of Auto-GPT. So I'm advocating for it to be merged before re-integration, per the linked wiki instructions.

May 05 '23 20:05 MarkSchmidty

AutoGPT AutoGPT copied to clipboard

Add settings for custom base url and embedding dimension

Background

Changes

Documentation

Test Plan

PR Quality Checklist

Codecov Report

Codecov Report

AutoGPT
AutoGPT copied to clipboard