langflow Code Execution vulnerability with tool PythonCodeTool

Bug Description

When compose an LLM app with langflow, PythonCodeTool is available to developers to implement a tool with StructuredTool in langchain. However, there is a lack of validation for the python code and codes will be executed directly. Once the LLM app is deployed on a server, arbitrary code can be executed on the server.

async def build(
        self,
        tool_code: str,
        name: str,
        description: str,
        tool_function: List[str],
        return_direct: bool,
        tool_class: Optional[List[str]] = None,
    ) -> Tool:
        local_namespace = {}  # type: ignore
        exec(tool_code, globals(), local_namespace)

Reproduction

1.Create a new project.

2.Import the json file to the collection.

PythonCodeTool.json

In the PythonCodeTool, Tool Code is set to read /etc/passwd with os.popen.

def search_function(query: str):
     import os
    return os.popen('cat /etc/passwd').read()

search = StructuredTool.from_function(
    func=search_function,
    name="Search",
    description="useful for when you need to answer questions about current events",
    # coroutine= ... <- you can specify an async method if desired as well
)

And in order to use the search tool, the input is set to Search for the result.

Also input a valid OPENAI API KEY and OPENAI BASE.

3.Run Flow in the Playground. Code in PythonCodeTool is executed and we can get the content of /etc/passwd of the server. Untitled

Expected behavior

Since all the inputs can be controled by users, validation should be taken into consideration when executing users code. Enable the necessary action and forbid the malicious behavior before using exec.

Who can help?

@ogabrielluiz

Operating System

Ubuntu Linux 22.04

Langflow Version

v1.0.12

Python Version

=3.10

Jul 24 '24 07:07 0gur1

To address the security issue with the PythonCodeTool in Langflow, you can implement validation to prevent the execution of malicious code before using exec. Here's how you can do it:

Use the validate_code function: This function parses the code into an abstract syntax tree (AST) and checks for import errors and function definition errors. Integrate this validation into the PythonCodeTool class.
Modify the build method: Ensure that the code is validated before execution.

Here is an example of how you can implement this:

Step 1: Implement the `validate_code` Function

Create a validate.py file with the following content:

import ast
import importlib

def add_type_ignores():
    if not hasattr(ast, "TypeIgnore"):
        class TypeIgnore(ast.AST):
            _fields = ()
        ast.TypeIgnore = TypeIgnore

def validate_code(code):
    errors = {"imports": {"errors": []}, "function": {"errors": []}}

    try:
        tree = ast.parse(code)
    except Exception as e:
        errors["function"]["errors"].append(str(e))
        return errors

    add_type_ignores()
    tree.type_ignores = []

    for node in tree.body:
        if isinstance(node, ast.Import):
            for alias in node.names:
                try:
                    importlib.import_module(alias.name)
                except ModuleNotFoundError as e:
                    errors["imports"]["errors"].append(str(e))

    for node in tree.body:
        if isinstance(node, ast.FunctionDef):
            code_obj = compile(ast.Module(body=[node], type_ignores=[]), "<string>", "exec")
            try:
                exec(code_obj)
            except Exception as e:
                errors["function"]["errors"].append(str(e))

    return errors

Step 2: Integrate Validation into the `PythonCodeTool` Class

Modify the PythonCodeTool class to use the validate_code function before executing any code:

from typing import Callable, Optional, List
from pydantic.v1 import BaseModel, validator
from langflow.utils import validate_code
from langchain.agents.tools import Tool

class Function(BaseModel):
    code: str
    function: Optional[Callable] = None
    imports: Optional[str] = None

    def __init__(self, **data):
        super().__init__(**data)

    @validator("code")
    def validate_func(cls, v):
        errors = validate_code(v)
        if errors["imports"]["errors"] or errors["function"]["errors"]:
            raise ValueError(f"Code validation errors: {errors}")
        return v

    def get_function(self):
        function_name = self.extract_function_name(self.code)
        return self.create_function(self.code, function_name)

class PythonCodeTool(Function, Tool):
    name: str = "Custom Tool"
    description: str
    code: str

    def __init__(self, name: str, description: str, code: str):
        self.name = name
        self.description = description
        self.code = code
        self.func = self.get_function()
        super().__init__(name=name, description=description, func=self.func)

    async def build(
        self,
        tool_code: str,
        name: str,
        description: str,
        tool_function: List[str],
        return_direct: bool,
        tool_class: Optional[List[str]] = None,
    ) -> Tool:
        errors = validate_code(tool_code)
        if errors["imports"]["errors"] or errors["function"]["errors"]:
            raise ValueError(f"Code validation errors: {errors}")

        local_namespace = {}
        exec(tool_code, globals(), local_namespace)

        func = local_namespace[tool_function]
        _class = None

        if tool_class:
            _class = local_namespace[tool_class]

        tool = StructuredTool.from_function(
            func=func, args_schema=_class, name=name, description=description, return_direct=return_direct
        )
        return tool

By integrating the validate_code function, you ensure that the code is validated before execution, preventing the execution of malicious code ^[1]^[2]^[3]^[4]^[5].

To continue talking to Dosu, mention @dosu.

Jul 24 '24 07:07 dosubot[bot]

I share the same opinion. Using exec is very risky, and I was concerned about it while developing. I am currently working on a new design.

Preview

import ast
from typing import Any, Dict
 
from langchain.agents import Tool
from langflow.inputs.inputs import MultilineInput, MessageTextInput, BoolInput, DropdownInput
from langchain_core.tools import StructuredTool
from langflow.io import Output
 
from langflow.custom import Component
from langflow.schema.dotdict import dotdict
 
 
class PythonCodeStructuredTool(Component):
    display_name = "Python Code Tool"
    description = "structuredtool dataclass code to tool"
    documentation = "https://python.langchain.com/docs/modules/tools/custom_tools/#structuredtool-dataclass"
    name = "PythonCodeStructuredTool"
    icon = "🐍"
    field_order = ["name", "description", "tool_code", "return_direct", "tool_function", "tool_class"]
    inputs = [
        MultilineInput(
            name="tool_code",
            display_name="Tool Code",
            info="Enter the dataclass code.",
            placeholder="def my_function(args):\n    pass",
            refresh_button=True,
        ),
        MessageTextInput(name="tool_name", display_name="Tool Name", info="Enter the name of the tool."),
        MessageTextInput(name="tool_description", display_name="Description", info="Enter the description of the tool."),
        BoolInput(name="return_direct", display_name="Return Directly", info="Should the tool return the function output directly?"),
        DropdownInput(
            name="tool_function",
            display_name="Tool Function",
            info="Select the function for additional expressions.",
            options=[],
            refresh_button=True,
        ),
        DropdownInput(
            name="tool_class",
            display_name="Tool Class",
            info="Select the class for additional expressions.",
            options=[],
            required=False,
            refresh_button=True,
        ),
    ]
    outputs = [
        Output(display_name="Tool", name="result_tool", method="build_tool"),
    ]
 
    def parse_source_name(self, code: str) -> Dict:
        parsed_code = ast.parse(code)
        class_names = [node.name for node in parsed_code.body if isinstance(node, ast.ClassDef)]
        function_names = [node.name for node in parsed_code.body if isinstance(node, ast.FunctionDef)]
        return {"class": class_names, "function": function_names}
 
    def update_build_config(self, build_config: dotdict, field_value: Any, field_name: str | None = None) -> dotdict:
        if field_name == "tool_code" or field_name == "tool_function" or field_name == "tool_class":
            try:
                names = self.parse_source_name(build_config.tool_code.value)
                build_config["tool_function"]["options"] = names["function"]
                build_config["tool_class"]["options"] = names["class"]
            except Exception as e:
                self.status = f"Failed to extract class names: {str(e)}"
                build_config["tool_function"]["options"] = ["Failed to parse", str(e)]
                build_config["tool_class"]["options"] = ["Failed to parse", str(e)]
        return build_config
    def find_imports(self, code: str) -> dotdict:
        imports = []
        from_imports = []
        parsed_code = ast.parse(code)
        for node in parsed_code.body:
            if isinstance(node, ast.Import):
                for alias in node.names:
                    imports.append(alias.name)
            elif isinstance(node, ast.ImportFrom):
                from_imports.append(node)
        return {"imports": imports, "from_imports": from_imports}
    async def build_tool(self) -> Tool:
        local_namespace = {}  # type: ignore
        tool_code = f"from langchain_core.pydantic_v1 import BaseModel, Field\n{self.tool_code}"
        modules = self.find_imports(tool_code)
        import_code = ""
        for module in modules["imports"]:
            import_code += f"global {module}\nimport {module}\n"
        for from_module in modules["from_imports"]:
            for alias in from_module.names:
                import_code += f"global {alias.name}\n"
            import_code += f"from {from_module.module} import {', '.join([alias.name for alias in from_module.names])}\n"
        exec(import_code, globals())
        exec(tool_code, globals(), local_namespace)
 
        func = local_namespace[self.tool_function]
        _class = None
 
        if self.tool_class:
            _class = local_namespace[self.tool_class]
 
        tool = StructuredTool.from_function(
            func=func, args_schema=_class, name=self.tool_name, description=self.tool_description, return_direct=self.return_direct
        )
        return tool  # type: ignore
 
    def post_code_processing(self, new_frontend_node: dict, current_frontend_node: dict):
        """
        This function is called after the code validation is done.
        """
        frontend_node = super().post_code_processing(new_frontend_node, current_frontend_node)
        frontend_node["template"] = self.update_build_config(
            frontend_node["template"], frontend_node["template"]["tool_code"]["value"], "tool_code"
        )
        frontend_node = super().post_code_processing(new_frontend_node, current_frontend_node)
        return frontend_node

Jul 24 '24 08:07 YamonBot

There are options in LangChain's StructuredTool, such as the option to extract signatures. I am considering a format that reuses such options and does not use the Code field. If you have any suggestions for improvement, I will consider them

https://github.com/langflow-ai/langflow/pull/1747

Jul 24 '24 08:07 YamonBot

To avoid the risk for exec, we can:

Limit imports after find_imports function in the below code. Such as os, subprocess and sub modules like langflow.utils.validate.importlib.resources.os.
Since there are some modules which cannot be avoided, try to execute codes with a docker environment.

Jul 24 '24 08:07 0gur1

Hi @ogabrielluiz,

What's the security policy of langflow? I wonder if maintainers will patch the vulnerability and request for a CVE.

Thank you.

Jul 25 '24 02:07 0gur1

I don't think any update on this component is worth it in terms of security. Even implementing a sandbox is not enough to actually prevent malicious users to access the system, there are too many ways to escape it.

Langflow admin must be aware of it and do not let any client to execute code. Any component can be customized if you have access to the UI or the API to import a flow.

Langflow flows must be considered as application code, therefore it's not up to langflow runtime code to provide those mitigation. This can be achieved by Authorization implementation in the backend, which is lacking today but can "easily" replaced by any other http proxy with custom rules.

Jul 29 '24 13:07 nicoloboschi

@0gur1

Do you need any assistance with this case? If not, please let us know if this issue can be closed.

Jul 30 '24 14:07 carlosrcoelho

I find a simliar issue https://github.com/langflow-ai/langflow/issues/1973. It was caused by exec function as well. And it has been tagged as a security issue and assigned a CVE.

I consider langflow as an LLM service which can be deployed on a cloud server. Take the service provided by official in https://astra.datastax.com as a example, users can log in to the server and create flows. In this scene, an attacker can execute malicous code with the PythonCodeTool on the server.

So I wonder if components in langflow can be executed in a safer way.

Jul 30 '24 16:07 0gur1

Upon further reflection, I think it is reasonable to manage the operations within a component at the API call level, given that everything running within the system can be utilized. While imposing restrictions within the component can offer some assistance, it is not a fundamental solution. Having access to the editor screen essentially grants full control. As I mentioned in the previous post, LangFlow has now introduced user-specific API keys, providing a slight security enhancement compared to before. Although in the case of a demo space with autologin set to true, all information can be stolen as mentioned in that post, it is only a demo. When we manage it at the product level, it should naturally be managed by an account-based administrator.We occasionally see bots or hackers attempting to sign up on the addresses we use for development. Exposing the LangFlow endpoint on the internet is essentially equivalent to fully opening up the computer. It would be great to have stable component and flow management through source reviews by the maintainers at LangFlow Store. However, it is challenging to invest time and resources into that.

Jul 30 '24 17:07 YamonBot

https://github.com/langflow-ai/langflow/issues/1973

You can deploy Langflow in BE mode with no UI and then it is up to the Administrator/Operator to put API restrictions based on their deployment environment

Aug 02 '24 19:08 carlosrcoelho

langflow langflow copied to clipboard

Code Execution vulnerability with tool PythonCodeTool

Bug Description

Reproduction

Expected behavior

Who can help?

Operating System

Langflow Version

Python Version

Step 1: Implement the validate_code Function

Step 2: Integrate Validation into the PythonCodeTool Class

langflow
langflow copied to clipboard

Step 1: Implement the `validate_code` Function

Step 2: Integrate Validation into the `PythonCodeTool` Class