AutoGPT feat(forge/llm): Add `LlamafileProvider`

Background

This draft PR is a step toward enabling the use of local models in AutoGPT by adding llamafile as an LLM provider.

Implementation notes are included in forge/forge/llm/providers/llamafile/README.md

Related issues:

https://github.com/Significant-Gravitas/AutoGPT/issues/6336
https://github.com/Significant-Gravitas/AutoGPT/issues/6947

Depends on:

#7178
#7183

Changes 🏗️

Add minimal implementation of LlamafileProvider, a new ChatModelProvider for llamafiles. It extends BaseOpenAIProvider and only overrides methods that are necessary to get the system to work at a basic level.
Add support for mistral-7b-instruct-v0.2. This is the only model currently supported by LlamafileProvider because this is the only model I tested anything with.
~~Misc changes to app configuration to enable switching between openai/llamafile providers. In particular, added config field LLM_PROVIDER that, when set to 'llamafile', will use LllamafileProvider in agents rather than OpenAIProvider.~~
Add instructions to use AutoGPT with llamafile in the docs at autogpt/setup/index.md

Limitations:

Only tested with (quantized) Mistral-7B-Instruct-v0.2
Only tested with a single AutoGPT 'task' ("Tell me about Roman dodecahedrons")
Did not attempt extensive refactoring of existing components; I just added special cases as necessary
Haven't added any tests for new classes/methods

PR Quality Scorecard ✨

[x] Have you used the PR description template? +2 pts
[x] Is your pull request atomic, focusing on a single change? +5 pts
[x] Have you linked the GitHub issue(s) that this PR addresses? +5 pts
[x] Have you documented your changes clearly and comprehensively? +5 pts
[x] Have you changed or added a feature? -4 pts
- [ ] Have you added/updated corresponding documentation? +4 pts
- [ ] Have you added/updated corresponding integration tests? +5 pts
[ ] Have you changed the behavior of AutoGPT? -5 pts
- [ ] Have you also run agbenchmark to verify that these changes do not regress performance? +10 pts

Apr 19 '24 16:04 k8si

Deploy Preview for auto-gpt-docs canceled.

Name	Link
Latest commit	bac353e737b9d98e3a82e7e75d4d3f4130929a2e
Latest deploy log	https://app.netlify.com/sites/auto-gpt-docs/deploys/6697cbb90b0c8a00086c84e1

Apr 19 '24 16:04 netlify[bot]

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Apr 22 '24 15:04 github-actions[bot]

@CodiumAI-Agent /review

Apr 23 '24 14:04 Swiftyos

PR Review

⏱️ Estimated effort to review [1-5]	4, due to the complexity and breadth of the changes introduced, including new model provider integrations, extensive modifications to configuration and provider logic, and the addition of new scripts and documentation. The PR touches multiple core components and introduces a new LLM provider, which requires careful review to ensure compatibility and correctness.
🧪 Relevant tests	No
🔍 Possible issues	Possible Bug: The method `check_model_llamafile` in `configurator.py` uses `api_credentials.api_base.get_secret_value()` which might expose sensitive information in error messages. This could lead to security risks if the error messages are logged or displayed in an environment where unauthorized users can view them.
🔍 Possible issues	Possible Bug: In `LlamafileProvider`, the method `_create_chat_completion` hard-codes the `seed` for reproducibility, which might not be desirable in all use cases and could limit the functionality of the model in generating diverse responses.
🔒 Security concerns	Sensitive information exposure: The method `check_model_llamafile` potentially exposes sensitive API base URLs in exception messages, which could be a security risk if these messages are logged or improperly handled.

Code feedback:

relevant file	autogpts/autogpt/autogpt/app/configurator.py
suggestion	Consider removing or masking sensitive information such as `api_base` from error messages in `check_model_llamafile` to prevent potential leakage of sensitive data. [important]
relevant line	raise ValueError(f"llamafile server at {api_credentials.api_base.get_secret_value()} does not have access to {model_name}. Please configure {model_type} to use one of {available_model_ids} or use a different llamafile.")

relevant file	autogpts/autogpt/autogpt/core/resource/model_providers/llamafile.py
suggestion	Remove the hard-coded `seed` in `_create_chat_completion` or make it configurable via method parameters or configuration settings to allow for more dynamic behavior. [important]
relevant line	kwargs["seed"] = 0

✨ Review tool usage guide:

Overview: The review tool scans the PR code changes, and generates a PR review which includes several types of feedbacks, such as possible PR issues, security threats and relevant test in the PR. More feedbacks can be added by configuring the tool.

The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.

When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:

/review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...

With a configuration file, use the following template:

[pr_reviewer]
some_config1=...
some_config2=...

See the review usage page for a comprehensive guide on using this tool.

Apr 23 '24 14:04 CodiumAI-Agent

@k8si any chance you could enable maintainer write access on this PR?

May 24 '24 21:05 Pwuts

@Pwuts it doesn't look like I have the ability to do that. I added you as a maintainer to the forked project, is that sufficient or do others need write access?

Alternatively, you could branch off my branch and I can just accept the changes via PR?

May 29 '24 16:05 k8si

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

May 31 '24 02:05 github-actions[bot]

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

May 31 '24 02:05 github-actions[bot]

Codecov Report

Attention: Patch coverage is 35.76159% with 97 lines in your changes missing coverage. Please review.

Project coverage is 53.81%. Comparing base (bffb92b) to head (bac353e). Report is 30 commits behind head on master.

Files	Patch %	Lines
forge/forge/llm/providers/llamafile/llamafile.py	35.60%	85 Missing :warning:
forge/forge/llm/providers/multi.py	25.00%	12 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7091      +/-   ##
==========================================
- Coverage   54.21%   53.81%   -0.41%     
==========================================
  Files         122      124       +2     
  Lines        6875     7021     +146     
  Branches      881      909      +28     
==========================================
+ Hits         3727     3778      +51     
- Misses       3015     3110      +95     
  Partials      133      133

Flag	Coverage Δ
Linux	`53.56% <35.76%> (-0.40%)`	:arrow_down:
Windows	`50.41% <35.76%> (-0.34%)`	:arrow_down:
autogpt-agent	`33.99% <ø> (-0.03%)`	:arrow_down:
forge	`58.04% <35.76%> (-0.60%)`	:arrow_down:
macOS	`52.88% <35.76%> (-0.39%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

May 31 '24 02:05 codecov[bot]

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Jun 02 '24 23:06 github-actions[bot]

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

Jun 02 '24 23:06 github-actions[bot]

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Jun 03 '24 13:06 github-actions[bot]

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

Jun 03 '24 14:06 github-actions[bot]

Doesn't seem to work for me when using ./scripts/llamafile/serve.py

PS C:\Users\nicka\code\AutoGPTNew\autogpt> python3 .\scripts\llamafile\serve.py
Downloading mistral-7b-instruct-v0.2.Q5_K_M.llamafile.exe...
Downloading: [########################################] 100% - 5166.9/5166.9 MB
Traceback (most recent call last):
  File "C:\Users\nicka\code\AutoGPTNew\autogpt\scripts\llamafile\serve.py", line 56, in <module>
    download_llamafile()
  File "C:\Users\nicka\code\AutoGPTNew\autogpt\scripts\llamafile\serve.py", line 43, in download_llamafile
    subprocess.run([LLAMAFILE, "--version"], check=True)
  File "C:\Users\nicka\.pyenv\pyenv-win\versions\3.11.7\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\.pyenv\pyenv-win\versions\3.11.7\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\nicka\.pyenv\pyenv-win\versions\3.11.7\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 193] %1 is not a valid Win32 application

Important sidenote: that binary doesn't work to be executed so python of course fails

Jun 10 '24 17:06 ntindle

https://github.com/Mozilla-Ocho/llamafile/issues/257#issuecomment-1953146662

TLDR you need to download and execute llamafile.exe with some params because of sizes

Jun 10 '24 20:06 ntindle

I also get this after using the workaround above

2024-06-10 15:40:50,164 ERROR  Please set your OpenAI API key in .env or as an environment variable.
2024-06-10 15:40:51,339 INFO  You can get your key from https://platform.openai.com/account/api-keys
Please enter your OpenAI API key if you have it:

################################################################################
### AutoGPT - GENERAL SETTINGS
################################################################################

## OPENAI_API_KEY - OpenAI API Key (Example: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
# OPENAI_API_KEY=

## ANTHROPIC_API_KEY - Anthropic API Key (Example: sk-ant-api03-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
# ANTHROPIC_API_KEY=

## GROQ_API_KEY - Groq API Key (Example: gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
# GROQ_API_KEY=


################################################################################
### LLM MODELS
################################################################################

## SMART_LLM - Smart language model (Default: gpt-4-turbo)
SMART_LLM=mistral-7b-instruct-v0.2

## FAST_LLM - Fast language model (Default: gpt-3.5-turbo)
FAST_LLM=mistral-7b-instruct-v0.2

## EMBEDDING_MODEL - Model to use for creating embeddings
# EMBEDDING_MODEL=text-embedding-3-small

Jun 10 '24 20:06 ntindle

@ntindle would you mind trying again? I added logic to serve.py to download llamafile.exe ~~and extract the .gguf from the .llamafile~~ and run it like that.

Jun 14 '24 20:06 Pwuts

Can't get it to run without an openai key set

(agpt-py3.11) C:\Users\nicka\code\AutoGPTNew\autogpt>python -m autogpt
2024-06-15 19:18:02,550 WARNING  You don't have access to mistral-7b-instruct-v0.2. Setting fast_llm to OpenAIModelName.GPT3_ROLLING.
2024-06-15 19:18:02,552 WARNING  You don't have access to mistral-7b-instruct-v0.2. Setting smart_llm to OpenAIModelName.GPT3_ROLLING.
Traceback (most recent call last):
  File "C:\Users\nicka\code\AutoGPTNew\forge\forge\llm\providers\multi.py", line 142, in _get_provider
    settings.credentials = Credentials.from_env()
                           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\code\AutoGPTNew\forge\forge\models\config.py", line 61, in from_env
    return _recursive_init_model(cls, infer_field_value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\code\AutoGPTNew\forge\forge\models\config.py", line 184, in _recursive_init_model
    return model.parse_obj(user_config_fields)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pydantic\main.py", line 526, in pydantic.main.BaseModel.parse_obj
  File "pydantic\main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for OpenAICredentials
api_key
  field required (type=value_error.missing)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\nicka\code\AutoGPTNew\autogpt\autogpt\__main__.py", line 5, in <module>
    autogpt.app.cli.cli()
  File "C:\Users\nicka\AppData\Local\pypoetry\Cache\virtualenvs\agpt-979fLl6E-py3.11\Lib\site-packages\click\core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\AppData\Local\pypoetry\Cache\virtualenvs\agpt-979fLl6E-py3.11\Lib\site-packages\click\core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\AppData\Local\pypoetry\Cache\virtualenvs\agpt-979fLl6E-py3.11\Lib\site-packages\click\core.py", line 1666, in invoke
    rv = super().invoke(ctx)
         ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\AppData\Local\pypoetry\Cache\virtualenvs\agpt-979fLl6E-py3.11\Lib\site-packages\click\core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\AppData\Local\pypoetry\Cache\virtualenvs\agpt-979fLl6E-py3.11\Lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\AppData\Local\pypoetry\Cache\virtualenvs\agpt-979fLl6E-py3.11\Lib\site-packages\click\decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\code\AutoGPTNew\autogpt\autogpt\app\cli.py", line 19, in cli
    ctx.invoke(run)
  File "C:\Users\nicka\AppData\Local\pypoetry\Cache\virtualenvs\agpt-979fLl6E-py3.11\Lib\site-packages\click\core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\code\AutoGPTNew\autogpt\autogpt\app\cli.py", line 159, in run
    run_auto_gpt(
  File "C:\Users\nicka\code\AutoGPTNew\autogpt\autogpt\app\utils.py", line 245, in wrapper
    return asyncio.run(f(*args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\.pyenv\pyenv-win\versions\3.11.7\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\.pyenv\pyenv-win\versions\3.11.7\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\.pyenv\pyenv-win\versions\3.11.7\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\nicka\code\AutoGPTNew\autogpt\autogpt\app\main.py", line 117, in run_auto_gpt
    llm_provider = _configure_llm_provider(config)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\code\AutoGPTNew\autogpt\autogpt\app\main.py", line 420, in _configure_llm_provider
    multi_provider.get_model_provider(model)
  File "C:\Users\nicka\code\AutoGPTNew\forge\forge\llm\providers\multi.py", line 121, in get_model_provider
    return self._get_provider(model_info.provider_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nicka\code\AutoGPTNew\forge\forge\llm\providers\multi.py", line 144, in _get_provider
    raise ValueError(
ValueError: ModelProviderName.OPENAI is unavailable: can't load credentials
Sentry is attempting to send 2 pending events
Waiting up to 2 seconds
Press Ctrl-Break to quit

## SMART_LLM - Smart language model (Default: gpt-4-turbo)
SMART_LLM=mistral-7b-instruct-v0.2

## FAST_LLM - Fast language model (Default: gpt-3.5-turbo)
FAST_LLM=mistral-7b-instruct-v0.2

Can't seem to find the support of the model for some reason

This does resolve the download and run problem for llamafile executable though

Jun 16 '24 00:06 ntindle

Also this should try mattching better. For example just mistral should work if its the only one or mistral-7b if there's two. Adding parts should only be required rarely

Jun 16 '24 00:06 ntindle

That crash is because it defaults to gpt-3.5-turbo upon not finding mistral-7b-instruct-v0.2. Fix the latter -> fix the former. Although maybe we should have a more descriptive error message instead of that huge stack trace.

Also this should try mattching better. For example just mistral should work if its the only one or mistral-7b if there's two. Adding parts should only be required rarely

I disagree. A value should have the same meaning, regardless of circumstances. Setting mistral may suddenly break if a second mistral model is installed. That's undesirable behavior in my opinion.

Jun 16 '24 00:06 Pwuts

My counter to that is we don’t require gpt-4-0611 we just require gpt-4 and match the best we can to a rolling

Jun 16 '24 14:06 ntindle

While you're at it, could you check why Autogpt sends empty assisstant conntents when using local llm settings? I used lmstudio as the openai base url and get errors because autogpt keeps sending this after a failed response:

...
    {
      "role": "system",
      "content": "ERROR PARSING YOUR RESPONSE:\n\nValidationError: 3 validation errors for OneShotAgentActionProposal\nthoughts -> plan\n  field required (type=value_error.missing)\nthoughts -> speak\n  field required (type=value_error.missing)\nuse_tool\n  field required (type=value_error.missing)"
    },
    {
      "content": "",
      "role": "assistant"
    },
    {
      "role": "system",
      "content": "ERROR PARSING YOUR RESPONSE:\n\nInvalidAgentResponseError: Assistant response has no text content"
    }
  ],
  "model": "gpt-3.5-turbo"
}
[2024-06-16 17:28:16.570] [ERROR] [Server Error] {"title":"'messages' array must only contain objects with a 'content' field that is not empty"}

Jun 16 '24 15:06 Wladastic

My counter to that is we don't require gpt-4-0611 we just require gpt-4 and match the best we can to a rolling

That's not really how it works. We match it based on hard-coded relationships, which we can do because we know OpenAI's model range and their documentation says which models the rolling aliases point to.

Jun 17 '24 17:06 Pwuts

I think we shouldn't automatically match the name (or at least not without any warning); this may lead to unexpected behaviour. We can make error message better and include did you mean "mistral-7b-instruct-v0.2"?, as a hint for user.

Jun 18 '24 12:06 kcze

bump

Jun 19 '24 13:06 elix1er

I am currently experiencing these issues: https://github.com/Mozilla-Ocho/llamafile/issues/356, https://github.com/Mozilla-Ocho/llamafile/issues/100.

May need to amend llamafile/serve.py further to fix this for WSL.

Update: this isn't scriptable and not our problem. I'll amend the docs with a note that llamafiles can't be run from WSL, but can still be used by running them on Windows and then connecting to them in WSL.

Jun 21 '24 06:06 Pwuts

this "works" but the model isn't great. Can we constrain the output schema to our models like is an option in the llamafile UI?

I think we could implement something like that by allowing to pass a model as the completion_parser. Sounds like a follow-up PR though.

Jun 26 '24 19:06 Pwuts

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Jul 02 '24 18:07 github-actions[bot]

Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly.

Jul 03 '24 21:07 github-actions[bot]

AutoGPT AutoGPT copied to clipboard

feat(forge/llm): Add `LlamafileProvider`

Background

Changes 🏗️

PR Quality Scorecard ✨

✅ Deploy Preview for auto-gpt-docs canceled.

PR Review

Codecov Report

AutoGPT
AutoGPT copied to clipboard

Deploy Preview for auto-gpt-docs canceled.