langflow
langflow copied to clipboard
Code Execution vulnerability with tool PythonCodeTool
Bug Description
When compose an LLM app with langflow, PythonCodeTool is available to developers to implement a tool with StructuredTool in langchain. However, there is a lack of validation for the python code and codes will be executed directly.
Once the LLM app is deployed on a server, arbitrary code can be executed on the server.
async def build(
self,
tool_code: str,
name: str,
description: str,
tool_function: List[str],
return_direct: bool,
tool_class: Optional[List[str]] = None,
) -> Tool:
local_namespace = {} # type: ignore
exec(tool_code, globals(), local_namespace)
Reproduction
1.Create a new project.
2.Import the json file to the collection.
In the PythonCodeTool, Tool Code is set to read /etc/passwd with os.popen.
def search_function(query: str):
import os
return os.popen('cat /etc/passwd').read()
search = StructuredTool.from_function(
func=search_function,
name="Search",
description="useful for when you need to answer questions about current events",
# coroutine= ... <- you can specify an async method if desired as well
)
And in order to use the search tool, the input is set to Search for the result.
Also input a valid OPENAI API KEY and OPENAI BASE.
3.Run Flow in the Playground. Code in PythonCodeTool is executed and we can get the content of /etc/passwd of the server.
Expected behavior
Since all the inputs can be controled by users, validation should be taken into consideration when executing users code. Enable the necessary action and forbid the malicious behavior before using exec.
Who can help?
@ogabrielluiz
Operating System
Ubuntu Linux 22.04
Langflow Version
v1.0.12
Python Version
=3.10
To address the security issue with the PythonCodeTool in Langflow, you can implement validation to prevent the execution of malicious code before using exec. Here's how you can do it:
-
Use the
validate_codefunction: This function parses the code into an abstract syntax tree (AST) and checks for import errors and function definition errors. Integrate this validation into thePythonCodeToolclass. -
Modify the
buildmethod: Ensure that the code is validated before execution.
Here is an example of how you can implement this:
Step 1: Implement the validate_code Function
Create a validate.py file with the following content:
import ast
import importlib
def add_type_ignores():
if not hasattr(ast, "TypeIgnore"):
class TypeIgnore(ast.AST):
_fields = ()
ast.TypeIgnore = TypeIgnore
def validate_code(code):
errors = {"imports": {"errors": []}, "function": {"errors": []}}
try:
tree = ast.parse(code)
except Exception as e:
errors["function"]["errors"].append(str(e))
return errors
add_type_ignores()
tree.type_ignores = []
for node in tree.body:
if isinstance(node, ast.Import):
for alias in node.names:
try:
importlib.import_module(alias.name)
except ModuleNotFoundError as e:
errors["imports"]["errors"].append(str(e))
for node in tree.body:
if isinstance(node, ast.FunctionDef):
code_obj = compile(ast.Module(body=[node], type_ignores=[]), "<string>", "exec")
try:
exec(code_obj)
except Exception as e:
errors["function"]["errors"].append(str(e))
return errors
Step 2: Integrate Validation into the PythonCodeTool Class
Modify the PythonCodeTool class to use the validate_code function before executing any code:
from typing import Callable, Optional, List
from pydantic.v1 import BaseModel, validator
from langflow.utils import validate_code
from langchain.agents.tools import Tool
class Function(BaseModel):
code: str
function: Optional[Callable] = None
imports: Optional[str] = None
def __init__(self, **data):
super().__init__(**data)
@validator("code")
def validate_func(cls, v):
errors = validate_code(v)
if errors["imports"]["errors"] or errors["function"]["errors"]:
raise ValueError(f"Code validation errors: {errors}")
return v
def get_function(self):
function_name = self.extract_function_name(self.code)
return self.create_function(self.code, function_name)
class PythonCodeTool(Function, Tool):
name: str = "Custom Tool"
description: str
code: str
def __init__(self, name: str, description: str, code: str):
self.name = name
self.description = description
self.code = code
self.func = self.get_function()
super().__init__(name=name, description=description, func=self.func)
async def build(
self,
tool_code: str,
name: str,
description: str,
tool_function: List[str],
return_direct: bool,
tool_class: Optional[List[str]] = None,
) -> Tool:
errors = validate_code(tool_code)
if errors["imports"]["errors"] or errors["function"]["errors"]:
raise ValueError(f"Code validation errors: {errors}")
local_namespace = {}
exec(tool_code, globals(), local_namespace)
func = local_namespace[tool_function]
_class = None
if tool_class:
_class = local_namespace[tool_class]
tool = StructuredTool.from_function(
func=func, args_schema=_class, name=name, description=description, return_direct=return_direct
)
return tool
By integrating the validate_code function, you ensure that the code is validated before execution, preventing the execution of malicious code [1][2][3][4][5].
To continue talking to Dosu, mention @dosu.
I share the same opinion. Using exec is very risky, and I was concerned about it while developing. I am currently working on a new design.
Preview
import ast
from typing import Any, Dict
from langchain.agents import Tool
from langflow.inputs.inputs import MultilineInput, MessageTextInput, BoolInput, DropdownInput
from langchain_core.tools import StructuredTool
from langflow.io import Output
from langflow.custom import Component
from langflow.schema.dotdict import dotdict
class PythonCodeStructuredTool(Component):
display_name = "Python Code Tool"
description = "structuredtool dataclass code to tool"
documentation = "https://python.langchain.com/docs/modules/tools/custom_tools/#structuredtool-dataclass"
name = "PythonCodeStructuredTool"
icon = "🐍"
field_order = ["name", "description", "tool_code", "return_direct", "tool_function", "tool_class"]
inputs = [
MultilineInput(
name="tool_code",
display_name="Tool Code",
info="Enter the dataclass code.",
placeholder="def my_function(args):\n pass",
refresh_button=True,
),
MessageTextInput(name="tool_name", display_name="Tool Name", info="Enter the name of the tool."),
MessageTextInput(name="tool_description", display_name="Description", info="Enter the description of the tool."),
BoolInput(name="return_direct", display_name="Return Directly", info="Should the tool return the function output directly?"),
DropdownInput(
name="tool_function",
display_name="Tool Function",
info="Select the function for additional expressions.",
options=[],
refresh_button=True,
),
DropdownInput(
name="tool_class",
display_name="Tool Class",
info="Select the class for additional expressions.",
options=[],
required=False,
refresh_button=True,
),
]
outputs = [
Output(display_name="Tool", name="result_tool", method="build_tool"),
]
def parse_source_name(self, code: str) -> Dict:
parsed_code = ast.parse(code)
class_names = [node.name for node in parsed_code.body if isinstance(node, ast.ClassDef)]
function_names = [node.name for node in parsed_code.body if isinstance(node, ast.FunctionDef)]
return {"class": class_names, "function": function_names}
def update_build_config(self, build_config: dotdict, field_value: Any, field_name: str | None = None) -> dotdict:
if field_name == "tool_code" or field_name == "tool_function" or field_name == "tool_class":
try:
names = self.parse_source_name(build_config.tool_code.value)
build_config["tool_function"]["options"] = names["function"]
build_config["tool_class"]["options"] = names["class"]
except Exception as e:
self.status = f"Failed to extract class names: {str(e)}"
build_config["tool_function"]["options"] = ["Failed to parse", str(e)]
build_config["tool_class"]["options"] = ["Failed to parse", str(e)]
return build_config
def find_imports(self, code: str) -> dotdict:
imports = []
from_imports = []
parsed_code = ast.parse(code)
for node in parsed_code.body:
if isinstance(node, ast.Import):
for alias in node.names:
imports.append(alias.name)
elif isinstance(node, ast.ImportFrom):
from_imports.append(node)
return {"imports": imports, "from_imports": from_imports}
async def build_tool(self) -> Tool:
local_namespace = {} # type: ignore
tool_code = f"from langchain_core.pydantic_v1 import BaseModel, Field\n{self.tool_code}"
modules = self.find_imports(tool_code)
import_code = ""
for module in modules["imports"]:
import_code += f"global {module}\nimport {module}\n"
for from_module in modules["from_imports"]:
for alias in from_module.names:
import_code += f"global {alias.name}\n"
import_code += f"from {from_module.module} import {', '.join([alias.name for alias in from_module.names])}\n"
exec(import_code, globals())
exec(tool_code, globals(), local_namespace)
func = local_namespace[self.tool_function]
_class = None
if self.tool_class:
_class = local_namespace[self.tool_class]
tool = StructuredTool.from_function(
func=func, args_schema=_class, name=self.tool_name, description=self.tool_description, return_direct=self.return_direct
)
return tool # type: ignore
def post_code_processing(self, new_frontend_node: dict, current_frontend_node: dict):
"""
This function is called after the code validation is done.
"""
frontend_node = super().post_code_processing(new_frontend_node, current_frontend_node)
frontend_node["template"] = self.update_build_config(
frontend_node["template"], frontend_node["template"]["tool_code"]["value"], "tool_code"
)
frontend_node = super().post_code_processing(new_frontend_node, current_frontend_node)
return frontend_node
There are options in LangChain's StructuredTool, such as the option to extract signatures. I am considering a format that reuses such options and does not use the Code field. If you have any suggestions for improvement, I will consider them
https://github.com/langflow-ai/langflow/pull/1747
To avoid the risk for exec, we can:
-
Limit imports after
find_importsfunction in the below code. Such asos,subprocessand sub modules likelangflow.utils.validate.importlib.resources.os. -
Since there are some modules which cannot be avoided, try to execute codes with a docker environment.
Hi @ogabrielluiz,
What's the security policy of langflow? I wonder if maintainers will patch the vulnerability and request for a CVE.
Thank you.
I don't think any update on this component is worth it in terms of security. Even implementing a sandbox is not enough to actually prevent malicious users to access the system, there are too many ways to escape it.
Langflow admin must be aware of it and do not let any client to execute code. Any component can be customized if you have access to the UI or the API to import a flow.
Langflow flows must be considered as application code, therefore it's not up to langflow runtime code to provide those mitigation. This can be achieved by Authorization implementation in the backend, which is lacking today but can "easily" replaced by any other http proxy with custom rules.
@0gur1
Do you need any assistance with this case? If not, please let us know if this issue can be closed.
I find a simliar issue https://github.com/langflow-ai/langflow/issues/1973. It was caused by exec function as well. And it has been tagged as a security issue and assigned a CVE.
I consider langflow as an LLM service which can be deployed on a cloud server. Take the service provided by official in https://astra.datastax.com as a example, users can log in to the server and create flows. In this scene, an attacker can execute malicous code with the PythonCodeTool on the server.
So I wonder if components in langflow can be executed in a safer way.
Upon further reflection, I think it is reasonable to manage the operations within a component at the API call level, given that everything running within the system can be utilized. While imposing restrictions within the component can offer some assistance, it is not a fundamental solution. Having access to the editor screen essentially grants full control. As I mentioned in the previous post, LangFlow has now introduced user-specific API keys, providing a slight security enhancement compared to before. Although in the case of a demo space with autologin set to true, all information can be stolen as mentioned in that post, it is only a demo. When we manage it at the product level, it should naturally be managed by an account-based administrator.We occasionally see bots or hackers attempting to sign up on the addresses we use for development. Exposing the LangFlow endpoint on the internet is essentially equivalent to fully opening up the computer. It would be great to have stable component and flow management through source reviews by the maintainers at LangFlow Store. However, it is challenging to invest time and resources into that.
https://github.com/langflow-ai/langflow/issues/1973
You can deploy Langflow in BE mode with no UI and then it is up to the Administrator/Operator to put API restrictions based on their deployment environment