AutoGPT icon indicating copy to clipboard operation
AutoGPT copied to clipboard

Legal and Ethical Safeguards for Prompts

Open w0lph opened this issue 1 year ago • 14 comments

Duplicates

  • [X] I have searched the existing issues

Summary 💡

Currently, the agents are entirely unbounded by ethical and legal considerations. I have provided some examples that are a step toward adding default safeguards against malicious behavior. This is a complex and evolving issue, but something is better than nothing.

Examples 🌈

Heuristic Imperatives from David Shapiro: A simple constraint: "Reduce suffering in the universe, increase prosperity in the universe, and increase understanding in the universe."

Constitutional AI: Harmlessness from AI Feedback: A series of self-critique instructions.

Motivation 🔦

These constraints must be added to the prompt.py file so that agents don't end up misbehaving and causing illegal or unethical consequences.

w0lph avatar Apr 20 '23 17:04 w0lph

They are if you use the azure api

Cytranics avatar Apr 20 '23 20:04 Cytranics

They are if you use the azure api

Relying solely on the underlying levels of the stack to act as a safeguard is not the safest path, as people might use different models that are unconstrained or such models might be compromised.

It's prudent to add a safeguard at the agent level to prevent unintended behavior if the model becomes insufficient. This will help create an additional baseline of safety for Auto-GPT that can help develop more capabilities on top of it more safely.

w0lph avatar Apr 20 '23 20:04 w0lph

As discussed in https://github.com/Significant-Gravitas/Auto-GPT/discussions/211, this is a very important concern. Putting a safer default is very valuable globally. This would also help avoid potential future legal problems with the userbase.

@w0lph is there a PR associated with this issue? It would help a lot with getting it processed.

rabyj avatar Apr 21 '23 17:04 rabyj

I will create a plugin with the first prompt.

hdkiller avatar Apr 21 '23 17:04 hdkiller

I'm changing the title to Safeguards as I don't want this to be seen as censorship, just safe defaults. It also encompasses other improvements on this front.

w0lph avatar Apr 21 '23 17:04 w0lph

We discussed this internally a bit. One concern is people could remove the safeguarding code very easily. Thoughts on how to help with that?

ntindle avatar Apr 22 '23 08:04 ntindle

That’s their decision if they remove safeguards, as they are responsible for the actions of their bot.

hdkiller avatar Apr 22 '23 08:04 hdkiller

Encrypt safety prompts using a hash to the average joe wont know. If someone is smart enough to decrypt then they can build their own agent anyways.

From: Nicholas Tindle @.> Sent: Saturday, April 22, 2023 4:01 AM To: Significant-Gravitas/Auto-GPT @.> Cc: Cory Coddington @.>; Comment @.> Subject: Re: [Significant-Gravitas/Auto-GPT] URGENT: Legal and Ethical Safeguards (Issue #2701)

We discussed this internally a bit. One concern is people could remove the safeguarding code very easily. Thoughts on how to help with that?

— Reply to this email directly, view it on GitHubhttps://github.com/Significant-Gravitas/Auto-GPT/issues/2701#issuecomment-1518553553, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5RL5MDRNIQ373GTOOBVI53XCOF4JANCNFSM6AAAAAAXFZPT3Q. You are receiving this because you commented.Message ID: @.@.>>

Cytranics avatar Apr 22 '23 11:04 Cytranics

They could just remove that section of code though

ntindle avatar Apr 23 '23 03:04 ntindle

Stop calling people taking interest in agi's average Joes, it involves risk to humanity, refer to comic books. :)

IsleOf avatar Apr 23 '23 12:04 IsleOf

one starting point would be passing shell commands to this safeguard prior to executing such commands, this could then look for "jailbreaks" or other questionable stuff, like trying to get root access etc

Boostrix avatar Apr 30 '23 06:04 Boostrix

Makes more sense to have this as a .env feature that is enabled by default.

People will always find a way to circumvent censorship, even when it is been spoon-fed to them as "safety".

suparious avatar May 11 '23 02:05 suparious

People will always find a way to circumvent censorship, even when it is been spoon-fed to them as "safety."

In a saner and more intelligent society, people would stop conflating those two concepts and start promoting responsible AI development with the caution and nuance it deserves.

w0lph avatar May 11 '23 04:05 w0lph

See also: #211

Boostrix avatar May 11 '23 08:05 Boostrix

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

github-actions[bot] avatar Sep 06 '23 21:09 github-actions[bot]

This issue was closed automatically because it has been stale for 10 days with no activity.

github-actions[bot] avatar Sep 17 '23 01:09 github-actions[bot]