AutoGPT icon indicating copy to clipboard operation
AutoGPT copied to clipboard

Option to only allow whitelisted sites in Auto-GPT

Open dboggs95 opened this issue 1 year ago • 29 comments

Duplicates

  • [X] I have searched the existing issues

Summary 💡

As a user, I want AutoGPT to have a customizable whitelist, so that AutoGPT is efficient and secure to use.

Create a setting that blacklists the entire internet by default and contains a customizable whitelist selected sites necessary for programming or research.

Ideally AutoGPT would simply skip anything that isn't on the whitelist, but if it did try a blacklisted site (maybe due to a surprise redirect), it should be blocked and should gracefully move on to a whitelisted result.

Examples 🌈

Say I want to build a Spring application, I would want to create a whitelist containing: Spring GitHub Maven Stack Overflow Reddit Wikipedia and anywhere else that has useful information.

Motivation 🔦

There are two motivations for this:

  1. I'm nervous about letting an AI decide to go over my network and access sites on the internet. I know ChatGPT is generally trained to follow the law, but if it's going over my network to grab things, I would much rather have a mechanism to keep it from going anywhere it doesn't really need to go, because the people responsible for this level of quality are not going to be liable for what happens on my machine if they break it. Additionally, I've seen concerns brought up on articles around the web that if a hacker could trick AutoGPT into downloading a virus it becomes a vulnerability. This approach is "secure by default," and that can be your answer when someone comes to you with these concerns.
  2. I'm noticing tons of junk AI generated websites that actually trick users into clicking on them because the site has keywords related to a difficult programming problem, but the contents of the site are randomly generated nonsense. If AutoGPT picks up on this junk it could end up going in circles and wasting API tokens. I think this will be a stronger motivation for users who trust ChatGPT's strong in-built aversion to anything offensive or illegal, since I've seen complaints that AutoGPT can be inefficient and go in circles. Regulating the quality of the input data would be useful to everyone.

dboggs95 avatar Sep 21 '23 17:09 dboggs95

@NeonN3mesis I want to work on this issue. But I'm not sure where to start. Is there any developer documentation that I can read?

733amir avatar Sep 22 '23 10:09 733amir

I would discuss it with @Pwuts

NeonN3mesis avatar Sep 22 '23 11:09 NeonN3mesis

Pre-selecting content that the AI can access is a good idea. For the implementation, I think it would be beneficial to also insert the whitelist in the prompt. That way the LLM doesn't have to "guess".

For implementation @733amir:

  • The whitelist setting can be added to .env and autogpt/config/config.py
  • The view_webpage function is implemented in autogpt/commands/web_selenium.py
  • A message with the whitelist can be inserted in the prompt in autogpt/agents/agent.py:Agent.construct_base_prompt. Note: I am about to move the functionality in that method to a PromptStrategy object, so the construct_base_prompt method will be removed soon.

Pwuts avatar Sep 22 '23 12:09 Pwuts

@Pwuts @733amir Would I be inserting the entire whitelist as text in the prompt or telling it where a file is? I imagined having a file full of domains on separate lines. There could even be a default whitelist containing the most common sites most developers would want AutoGPT to see.

dboggs95 avatar Sep 22 '23 13:09 dboggs95

@dboggs95 I am working on a better configuration system. For now, the easiest is to put them in .env though.

As a shortcut, you could make it purely a whitelisting function and handle the instruction/prompt part by putting it in prompt_settings.yml

Pwuts avatar Sep 22 '23 13:09 Pwuts

@Pwuts Is whitelist enough? I can add a blacklist as well.

Should I write it as a simple hostname match? Or use ask the user for regex for matching?

733amir avatar Sep 24 '23 06:09 733amir

  • Hostname is fine for now
  • Allowlist+denylist sounds good
    • What if both are set?

Pwuts avatar Sep 24 '23 11:09 Pwuts

@733amir @Pwuts

  • Hostname is fine for now

  • Allowlist+denylist sounds good

    • What if both are set?

Yeah. I was just going to say regex sounds like overkill. I guess there might be ways that's useful, but simple hostname match will handle 99%.

I would think if the setting that enables whitelist or blacklist is in one field that would be impossible. Something like: web-security-policy: allow # Valid values = {allow, deny, disabled} web-security-list:

  • wikipedia.org
  • stackoverflow.com
  • reddit.com
  • and so on

Maybe you can think of better field names. (And you can probably tell from that example I've never written a prompt for AutoGPT before; the lack of this feature combine with my risk aversion is the reason I haven't tried AutoGPT yet.)

I would be careful to make sure subdomains aren't blocked, for example: old.reddit.com. I would assume if I whitelist reddit.com, I am fine with any subdomains on the same host.

dboggs95 avatar Sep 24 '23 14:09 dboggs95

  • What if both are set?

@Pwuts

I think of them as empty lists to start with, then user add value to one or both of them. To apply the lists to view_webpage the logic would be like:

if (len(allowlist) > 0 and hostname not in allowlist) or hostname in denylist:
    return;  # preventing the network call

Having both of them is possible and the logic wouldn't conflict. I can break it in to two if and inform the user why the view_webpage didn't work. Was it because of allowlist or because of denylist?

733amir avatar Sep 24 '23 16:09 733amir

To add to @733amir idea, I would think the following would work:

  1. The whitelist and blacklist are empty to begin with on backend, the inputs for both need to be taken from the user or .env file via frontend
  2. The condition block will have three main conditions a. The website is allowed (normal AutoGPT behavior) b. The website is in blacklist (return access is blocked response, move on to next logic block / iteration) c. The website is neither in whitelist or blacklist (return access is not allowed response, move on to next logic block / iteration)

Users can input websites into both the whitelist and blacklist text areas. When running AutoGPT, the server will check both lists to determine whether a website should be allowed, blocked, or accessed with restrictions based on the user's configuration.

pallasite99 avatar Sep 24 '23 16:09 pallasite99

hey! can i contribute for this issue

SyedAbuBakerAli avatar Sep 25 '23 12:09 SyedAbuBakerAli

To add to @733amir idea, I would think the following would work:

  1. The whitelist and blacklist are empty to begin with on backend, the inputs for both need to be taken from the user or .env file via frontend
  2. The condition block will have three main conditions a. The website is allowed (normal AutoGPT behavior) b. The website is in blacklist (return access is blocked response, move on to next logic block / iteration) c. The website is neither in whitelist or blacklist (return access is not allowed response, move on to next logic block / iteration)

Users can input websites into both the whitelist and blacklist text areas. When running AutoGPT, the server will check both lists to determine whether a website should be allowed, blocked, or accessed with restrictions based on the user's configuration.

@733amir @Pwuts @pallassite99

I think 733amir is trying to say there is a concept of a greylist, meaning limited access. Unless we can define what that is, that's getting too far from the original ask. A whitelist tends to imply everything else is blacklisted by default and vice versa. If there is no option to say the default behavior is allow or deny, only the backlist would have meaning, not the whitelist.

Whatever you do, just make sure at the end I am able to configure it to have 100% of the internet blocked by default, and then whitelist what I want AutoGPT to use.

dboggs95 avatar Sep 25 '23 14:09 dboggs95

@733amir
Why don't you add myself and @SyedAbuBakerAli as collaborators to your fork? I'm a Java guy, not a Python guy, but I know enough I can take a look and provide feedback directly on what you have so far.

dboggs95 avatar Sep 25 '23 16:09 dboggs95

@733amir @SyedAbuBakerAli If don't get any further response, I'm going to attempt to write this change myself.

dboggs95 avatar Sep 27 '23 14:09 dboggs95

I haven't forgotten about this. Work has me tied up. I'm almost done with a big project. I'll have time to take care of this when I'm done with that.

dboggs95 avatar Oct 07 '23 01:10 dboggs95

Hi team, is it possible I can work on this issue ?

maanavssaggu avatar Oct 27 '23 13:10 maanavssaggu

Hi team, I've implemented a URL whitelist and blacklist check within the read_webpage function. The logic currently checks if the provided URL's hostname is not present in the whitelist or if it's in the blacklist. If either of these conditions is met, the function raises a CommandExecutionError indicating the issue.

I am wondering, 1. Is this the intended behavior for handling URLs not in the whitelist or those present in the blacklist? 2. Should we provide a default behavior in cases where both the whitelist and blacklist are empty or not provided?

maanavssaggu avatar Oct 29 '23 11:10 maanavssaggu

Hey, @maanavssaggu! I want to start contributing to the project. I am looking for issues that are suited for newcomers, so I am wondering if you need any help with this? I would love to work on some issues together with someone else.

estefysc avatar Nov 01 '23 20:11 estefysc

Hi team, I've implemented a URL whitelist and blacklist check within the read_webpage function. The logic currently checks if the provided URL's hostname is not present in the whitelist or if it's in the blacklist. If either of these conditions is met, the function raises a CommandExecutionError indicating the issue.

I am wondering, 1. Is this the intended behavior for handling URLs not in the whitelist or those present in the blacklist? 2. Should we provide a default behavior in cases where both the whitelist and blacklist are empty or not provided?

I'm actually pulling the project down now to take a look. I'd like to see what you have. I'm unsure my intentions are understood. I never meant for the blacklist to actually be listed. I meant to blacklist everything by default if this feature is enabled. AutoGPT should skip blacklisted sites, and should gracefully move on, not throw an exception. If I ran into an exception, then AutoGPT would quit. I'm fine if there is an opposing behavior; i.e. I can enable a behavior where we whitelist by default and blacklist selected sites, but then I would want a way to toggle back to blacklist by default, because that's the motivation behind this.

dboggs95 avatar Nov 03 '23 01:11 dboggs95

Something like the pseudocode below. I'm bad a python, so I haven't implemented it. If nobody gets around to it before me, I'll figure it out. But if you do, this is how I think it should look:

image image

As for the location of the whitelist and blacklist, I'm looking at prompt_settings.yaml, but I'm not sure this is a good place to put this. I wonder whether or not the .env config can point to another file, and then load that file via some kind of IO Utility.

Maybe that would look like this: image

dboggs95 avatar Nov 03 '23 01:11 dboggs95

@Pwuts This isn't tested yet, but here's is my first attempt at writing the code for this: https://github.com/dboggs95/AutoGPT/commit/64b63cbb45861d991c5757d479c150e778b51910

dboggs95 avatar Nov 25 '23 20:11 dboggs95

@Pwuts @dboggs95 was this issue concluded or can I still work on this?

Satyam97 avatar Feb 13 '24 21:02 Satyam97

@Satyam97 It's not concluded. I'm just slow. As you can see from my commit, I'm also not experienced with Python coding. I also don't know how to test my code properly yet. If you want to take this to the finish line, you're probably better equipped to do so.

I just want to make sure what I'm doing is understood. Allowlist and denylist are not redundant because they are doing two completely different things. If there is an allowlist, it means allow everything on this list and nothing else. If there is a denylist, it means allow everything in the world except what is on this list. The two concepts do not work together at the same time, and are not interchangeable. I really on care about the whitelist, but some people might prefer creating a denylist and sharing a big list of sites that AutoGPT is finding that are actually bad sites, so no reason not to do both here.

dboggs95 avatar Feb 13 '24 23:02 dboggs95

Hi @dboggs95 , thanks for quick reply. I am also new to opensource, but have been working with python for sometime now. Will try to solve this to the finish line.

And I did not meant they were redundant, I was referring to that they are mutually exclusive, and thus can be served using a single list rather than two different list based on policy value.

Satyam97 avatar Feb 14 '24 06:02 Satyam97

I wrote the code and I am raising a value error whenever the URL is present in deny list(for blaclisting). The console output looks something like this @dboggs95 . I have tested the code and it is working fine. I just need to fine tune how I am accessing environment variables, and will share my commit soon.

Also there a decorator that validates the URL at the top of the read_webpage method. I have added this validation to that decorator.

Thanks for hearing me out, meant a lot.

[2024-02-14 21:41:15,864] [forge.sdk.routes.agent_protocol] [ERROR]     ❌  Error whilst trying to execute a task step: 5e1a59bb-8404-464c-8087-26431898eb40
Traceback (most recent call last):
  File "/Users/satyam/backendProjects/AutoGPT/autogpts/falcon/forge/sdk/routes/agent_protocol.py", line 358, in execute_agent_task_step
    step = await agent.execute_step(task_id, step)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/satyam/backendProjects/AutoGPT/autogpts/falcon/forge/agent.py", line 186, in execute_step
    output = await self.abilities.run_action(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/satyam/backendProjects/AutoGPT/autogpts/falcon/forge/actions/registry.py", line 184, in run_action
    return await action(self.agent, task_id, *args, **kwds)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/satyam/backendProjects/AutoGPT/autogpts/falcon/forge/actions/registry.py", line 58, in __call__
    return self.method(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/satyam/backendProjects/AutoGPT/autogpts/falcon/forge/actions/web/web_selenium.py", line 103, in wrapper
    raise ValueError("URL Not Allowed")
ValueError: URL Not Allowed

Satyam97 avatar Feb 14 '24 16:02 Satyam97

Hi @dboggs95 , thanks for quick reply. I am also new to opensource, but have been working with python for sometime now. Will try to solve this to the finish line.

And I did not meant they were redundant, I was referring to that they are mutually exclusive, and thus can be served using a single list rather than two different list based on policy value.

The only reason I didn't do it that way was because a set of url's in a blacklist shouldn't suddenly become a whitelist just because I flipped the policy value.

dboggs95 avatar Feb 15 '24 00:02 dboggs95

I wrote the code and I am raising a value error whenever the URL is present in deny list(for blaclisting). The console output looks something like this @dboggs95 . I have tested the code and it is working fine. I just need to fine tune how I am accessing environment variables, and will share my commit soon.

Also there a decorator that validates the URL at the top of the read_webpage method. I have added this validation to that decorator.

Thanks for hearing me out, meant a lot.

[2024-02-14 21:41:15,864] [forge.sdk.routes.agent_protocol] [ERROR]     ❌  Error whilst trying to execute a task step: 5e1a59bb-8404-464c-8087-26431898eb40
Traceback (most recent call last):
  File "/Users/satyam/backendProjects/AutoGPT/autogpts/falcon/forge/sdk/routes/agent_protocol.py", line 358, in execute_agent_task_step
    step = await agent.execute_step(task_id, step)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/satyam/backendProjects/AutoGPT/autogpts/falcon/forge/agent.py", line 186, in execute_step
    output = await self.abilities.run_action(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/satyam/backendProjects/AutoGPT/autogpts/falcon/forge/actions/registry.py", line 184, in run_action
    return await action(self.agent, task_id, *args, **kwds)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/satyam/backendProjects/AutoGPT/autogpts/falcon/forge/actions/registry.py", line 58, in __call__
    return self.method(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/satyam/backendProjects/AutoGPT/autogpts/falcon/forge/actions/web/web_selenium.py", line 103, in wrapper
    raise ValueError("URL Not Allowed")
ValueError: URL Not Allowed

Does this stop AutoGPT from working? We can't control what websites it finds in a search, so ideally, these would be silent errors and it would just continue on to the next site.

dboggs95 avatar Feb 15 '24 00:02 dboggs95

Hi @dboggs95 ,

Yes, this change does not let the agent process the task any further, i.e, this webpage won't be read. I have mentioned the PR above, you can try locally if possible.

Satyam97 avatar Feb 15 '24 18:02 Satyam97

Hi @dboggs95 ,

Yes, this change does not let the agent process the task any further, i.e, this webpage won't be read. I have mentioned the PR above, you can try locally if possible.

The webpage won't be read, but it can continue to the next one, right? As long as it can try another webpage, that's correct, but if AutoGPT completely shuts down because of the error, it won't be usable since it would likely do so every time.

I'll try it out when I get a chance.

dboggs95 avatar Feb 16 '24 00:02 dboggs95

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

github-actions[bot] avatar Apr 06 '24 01:04 github-actions[bot]