Unsafe Code Execution in Code Interpreter Tool broken
The Code Interpreter Tool is described as follows:
Python3 code used to be interpreted in the Docker container. ALWAYS PRINT the final result and the output of the code
Based on this description, the generated Python code should use print() to display the results.
When running in Docker mode, the tool behaves as expected, returning the printed output from the console:
def run_code_in_docker(self, code: str, libraries_used: List[str]) -> str:
...
return exec_result.output.decode("utf-8")
However, when running in safe mode, the tool instead attempts to use the value of a result variable as the output:
def run_code_unsafe(self, code: str, libraries_used: List[str]) -> str:
...
return exec_locals.get("result", "No result variable found.")
This behavior is inconsistent with the tool’s description. When running in unsafe mode, the generated code does not assign a result variable, leading to the error:
No result variable found.
Temporary Workaround
To resolve this issue temporarily, I updated my coding agent’s goal with the following instruction:
It is important to return the results as a string variable. Printing them to the console will not be sufficient. Create a string variable called result containing all the results.
This ensures the generated code aligns with the requirements of the unsafe mode.
Proposed Solution
The tool should be updated to ensure consistency between Docker and safe modes. Specifically:
- Either update the tool’s description to include the need for a result variable in unsafe mode, or
- Modify the unsafe mode implementation to use printed output, aligning it with the Docker mode behavior.