MetaGPT [Vulnerability] Arbitrary code execution when use QaEngineer role

Vulnerability description When we use QaEngineer role, arbitrary code execution could happen because QaEngineer adopt a dangerous action RunCode to test codes generated by Engineer. RunCode.run_script() method invokes subprocess.Popen without any check. Evil guys can manipulate prompts to execute some sensitive operations.

Proof of concept My PoC code was designed with slight modifications in your tutorial example

import os
os.environ["OPENAI_API_KEY"] = "sk-..."

import asyncio
from metagpt.roles import (
    ProductManager,
    Architect,
    ProjectManager,
    Engineer,
    QaEngineer
)
from metagpt.team import Team

async def startup(idea: str):
    company = Team()
    company.hire(
        [
            ProductManager(),
            Architect(),
            ProjectManager(),
            Engineer(),
            QaEngineer()
        ]
    )
    company.invest(investment=1.0)
    company.run_project(idea=idea)

    await company.run(n_round=16)

async def app(user_prompt):
    await startup(idea=user_prompt)

if __name__ == "__main__":
    user_input = "I want to execute shell command `ls -l`. Please help me write a piece of code and test this code."
    asyncio.run(app(user_input))

And in the path MetaGPT/workspace/.../test_outputs/, we can notice the output of ls -l in a json file. It means that ls -l executes successfully. rce

Note that in this PoC I only execute ls -l, but in real sceanario, attacker could execute dangerous operations such as file deletions, backdoor opening.

Vulnerability solved suggestion Using docker to execute python is a good choice. Restricting some sensitive codes via whitelist or blacklist could also be considered.

Jan 10 '24 02:01 fubuki8087

Good advice. We will take this into consideration. Maybe you can also contribute a PR? 😁

Jan 17 '24 12:01 voidking

Good advice. We will take this into consideration. Maybe you can also contribute a PR? 😁

Thank you, but I am usually quite busy with work. However, I can offer some suggestions here.

Using docker. Docker offers an isolated environment. Even if attackers gain remote command execution permissions, they are unable to inflict actual harm on the real system. There are two good practices AutoGPT AutoGen you can refer to.
Imposing limitations on the commands that can be executed in Python/Shell. We can also build a whitelist of commands. Only necessary commands can be executed. There are also two good practices LlamaIndex Pandas-ai you can refer to.

I hope these suggestions will contribute to the improvement of your project's security.

Jan 22 '24 02:01 fubuki8087

sandbox requirements. I think it is reasonable in a certain sense, but there is always a tradeoff between security and functionality.

Mar 21 '24 13:03 geekan