Sandbox Fusion instructions use potentially faulty image.
System Info
The Verl documentations https://verl.readthedocs.io/en/latest/examples/sandbox_fusion_example.html, point to using either a local Sandbox Fusion image via instructions https://bytedance.github.io/SandboxFusion/docs/docs/get-started/#local-deployment, or through FaaS.
Essentially, the image provided by ByteDance for the local deployment results in race condition errors when multiple requests with stdin are submitted at the same time with incorrect code. Instead of seeing the reason the code failed to execute, you see:
{
"status": "SandboxError",
"message": "exception on running command python /tmp/tmpjbn4rp0z/tmpd8u2d705.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2d560>; the handler is closed | None",
"compile_result": null,
"run_result": {
"status": "Error",
"execution_time": null,
"return_code": null,
"stdout": null,
"stderr": "exception on running command python /tmp/tmpjbn4rp0z/tmpd8u2d705.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2d560>; the handler is closed | None"
},
"executor_pod_name": null,
"files": {}
}
I remediated the issue by patching the docker image. The relevant lines are https://github.com/bytedance/SandboxFusion/blob/main/sandbox/runners/base.py#L69C1-L83C33.
NOTE THAT THE DOCKER IMAGE PUBLISHED IS NOT UP TO DATE WITH THE GITHUB.
The docker image base.py file has code:
if stdin is not None:
p.stdin.write(stdin.encode())
p.stdin.close()
start_time = time.time()
which I replaced with
if stdin is not None:
try:
p.stdin.write(stdin.encode())
await p.stdin.drain()
p.stdin.close()
except Exception as e:
# Process exited before stdin could be written (e.g., syntax error)
logger.debug(f"Could not write to stdin (process already exited): {e}")
else:
p.stdin.close()
start_time = time.time()
which does not prevent the race condition, but stop it from impacting the results.
A first remediation plan is to just create a Docker image with the most recent SandBox fusion state, but there are issues when creating a new SandBox fusion image: https://github.com/bytedance/SandboxFusion/issues/69
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [x] My own task or dataset (give details below)
Reproduction
-
Run the provided local docker image.
docker run -it -p 8080:8080 volcengine/sandbox-fusion:server-20250609 -
Send multiple requests in parallel that use stdin and have a code error (such as missing parenthesis, etc).
#!/bin/bash
# Run 10 curl requests in parallel with errored code (missing parenethesis) + stdin
for i in {1..10}; do
(
echo "========== Request $i =========="
curl -s 'http://localhost:8080/run_code' \
-H 'Content-Type: application/json' \
--data-raw '{"code": "print(\"Hello, world!\"", "language": "python", "stdin": "test"}' \
| jq '.'
echo ""
) &
done
# Wait for all background jobs to complete
wait
echo "All requests completed"
- View Results
dev-dsk-alxzhang-2c-71fe722c % ./parallel.sh
========== Request 1 ==========
========== Request 2 ==========
========== Request 3 ==========
========== Request 4 ==========
========== Request 5 ==========
========== Request 6 ==========
========== Request 7 ==========
========== Request 8 ==========
========== Request 9 ==========
========== Request 10 ==========
{
"status": "SandboxError",
"message": "exception on running command python /tmp/tmpenkohukq/tmpnt99io0j.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2df20>; the handler is closed | None",
"compile_result": null,
"run_result": {
"status": "Error",
"execution_time": null,
"return_code": null,
"stdout": null,
"stderr": "exception on running command python /tmp/tmpenkohukq/tmpnt99io0j.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2df20>; the handler is closed | None"
},
"executor_pod_name": null,
"files": {}
}
{
"status": "SandboxError",
"message": "exception on running command python /tmp/tmpjbn4rp0z/tmpd8u2d705.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2d560>; the handler is closed | None",
"compile_result": null,
"run_result": {
"status": "Error",
"execution_time": null,
"return_code": null,
"stdout": null,
"stderr": "exception on running command python /tmp/tmpjbn4rp0z/tmpd8u2d705.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2d560>; the handler is closed | None"
},
"executor_pod_name": null,
"files": {}
}
{
"status": "SandboxError",
"message": "exception on running command python /tmp/tmpa61c30l5/tmp1dzk8x_m.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2e400>; the handler is closed | None",
"compile_result": null,
"run_result": {
"status": "Error",
"execution_time": null,
"return_code": null,
"stdout": null,
"stderr": "exception on running command python /tmp/tmpa61c30l5/tmp1dzk8x_m.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2e400>; the handler is closed | None"
},
"executor_pod_name": null,
"files": {}
}
{
"status": "SandboxError",
"message": "exception on running command python /tmp/tmpx4dar4z7/tmpxhuawudk.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2e8e0>; the handler is closed | None",
"compile_result": null,
"run_result": {
"status": "Error",
"execution_time": null,
"return_code": null,
"stdout": null,
"stderr": "exception on running command python /tmp/tmpx4dar4z7/tmpxhuawudk.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2e8e0>; the handler is closed | None"
},
"executor_pod_name": null,
"files": {}
}
{
"status": "SandboxError",
"message": "exception on running command python /tmp/tmp29fi134e/tmpmw37aqtx.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2edc0>; the handler is closed | None",
"compile_result": null,
"run_result": {
"status": "Error",
"execution_time": null,
"return_code": null,
"stdout": null,
"stderr": "exception on running command python /tmp/tmp29fi134e/tmpmw37aqtx.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2edc0>; the handler is closed | None"
},
"executor_pod_name": null,
"files": {}
}
{
"status": "SandboxError",
"message": "exception on running command python /tmp/tmps1_55ab2/tmpj4qnlu03.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2f2a0>; the handler is closed | None",
"compile_result": null,
"run_result": {
"status": "Error",
"execution_time": null,
"return_code": null,
"stdout": null,
"stderr": "exception on running command python /tmp/tmps1_55ab2/tmpj4qnlu03.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2f2a0>; the handler is closed | None"
},
"executor_pod_name": null,
"files": {}
}
{
"status": "SandboxError",
"message": "exception on running command python /tmp/tmp7tdjylqd/tmphti1unbr.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2f780>; the handler is closed | None",
"compile_result": null,
"run_result": {
"status": "Error",
"execution_time": null,
"return_code": null,
"stdout": null,
"stderr": "exception on running command python /tmp/tmp7tdjylqd/tmphti1unbr.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2f780>; the handler is closed | None"
},
"executor_pod_name": null,
"files": {}
}
{
"status": "SandboxError",
"message": "exception on running command python /tmp/tmprgcxs915/tmp2eovodmi.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2fc60>; the handler is closed | None",
"compile_result": null,
"run_result": {
"status": "Error",
"execution_time": null,
"return_code": null,
"stdout": null,
"stderr": "exception on running command python /tmp/tmprgcxs915/tmp2eovodmi.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2fc60>; the handler is closed | None"
},
"executor_pod_name": null,
"files": {}
}
{
"status": "Failed",
"message": "",
"compile_result": null,
"run_result": {
"status": "Finished",
"execution_time": 0.0029706954956054688,
"return_code": 1,
"stdout": "",
"stderr": " File \"/tmp/tmpmty6jwec/tmp7tldj04_.py\", line 1\n print(\"Hello, world!\"\n ^\nSyntaxError: '(' was never closed\n"
},
"executor_pod_name": null,
"files": {}
}
{
"status": "Failed",
"message": "",
"compile_result": null,
"run_result": {
"status": "Finished",
"execution_time": 0.009296894073486328,
"return_code": 1,
"stdout": "",
"stderr": " File \"/tmp/tmpy8q3irlg/tmp5czm68_i.py\", line 1\n print(\"Hello, world!\"\n ^\nSyntaxError: '(' was never closed\n"
},
"executor_pod_name": null,
"files": {}
}
All requests completed
Expected behavior
I expect that when we submit multiple requests with stdin and code mistakes to the sandbox, the sandbox should return the correct error code and the reason why the code failed.
We should this:
{
"status": "Failed",
"message": "",
"compile_result": null,
"run_result": {
"status": "Finished",
"execution_time": 0.0029706954956054688,
"return_code": 1,
"stdout": "",
"stderr": " File \"/tmp/tmpmty6jwec/tmp7tldj04_.py\", line 1\n print(\"Hello, world!\"\n ^\nSyntaxError: '(' was never closed\n"
},
"executor_pod_name": null,
"files": {}
}
instead of
{
"status": "SandboxError",
"message": "exception on running command python /tmp/tmp29fi134e/tmpmw37aqtx.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2edc0>; the handler is closed | None",
"compile_result": null,
"run_result": {
"status": "Error",
"execution_time": null,
"return_code": null,
"stdout": null,
"stderr": "exception on running command python /tmp/tmp29fi134e/tmpmw37aqtx.py: unable to perform operation on <WriteUnixTransport closed=True reading=False 0x7f9c77b2edc0>; the handler is closed | None"
},
"executor_pod_name": null,
"files": {}
}
or ideally SandBox Fusion should update the image they provide to users, since Verl directly points there for local deployment.
Related ticket cut to SandBoxFusion: https://github.com/bytedance/SandboxFusion/issues/69
The image is outdated, this is a workaround:
pull image volcengine/sandbox-fusion:server-20250609 start a docker instance and replace source code in the image save the docker as image
Thanks! To clarify, we replace the source code with the up-to-date github main branch?