insecure sanitization of code

Open coldwaterq opened this issue 2 months ago • 0 comments

https://github.com/Marktechpost/AI-Tutorial-Codes-Included/blob/52570e806d4a356c8ddf40fccc64fa891a9d1e9d/Data%20Science/Building%20an%20End-to-End%20Data%20Science%20Workflow%20with%20Machine%20Learning%2C%20Interpretability%2C%20and%20Gemini%20AI%20Assistance.ipynb#L1460

You can still call os.system in this situation. So ideally some form of code execution sandboxing would probably be better.

SAFE_GLOBALS = {"pd": pd, "np": np}
def run_generated_pandas(code: str, df_local: pd.DataFrame):
    banned = ["__", "import", "open(", "exec(", "eval(", "os.", "sys.", "pd.read", "to_csv", "to_pickle", "to_sql"]
    if any(b in code for b in banned): raise ValueError("Unsafe code rejected.")
    loc = {"df": df_local.copy()}
    exec(code, SAFE_GLOBALS, loc)
    return {k:v for k,v in loc.items() if k not in ("df",)}

still allows this to be executed. run_generated_pandas("getattr(getattr(np._pytesttester, bytes([111,115]).decode('ascii')), bytes([115, 121, 115, 116, 101, 109]).decode('ascii'))('calc')")

Increasing the complexity makes the attack harder, but that doesn't make it impossible and restricting python like this is a really hard problem, which is why starting a sandbox to run these kinds of things can be easier and more provably secure.

Oct 13 '25 22:10 coldwaterq