Redesign docker sandbox
What problem or use case are you trying to solve?
We're using exec_run to run commands in the sandbox. This isn't stateful, and doesn't handle CLI interactions via stdin very well.
Things we struggle with today:
- We don't keep track of cd commands
- The agent can't interact with stdin (e.g. it runs apt-get install without -y, it wants to type y to get through)
- this is more important if we e.g. ask the agent to develop an interactive CLI that it needs to test
- Can't use apt-get install in sandbox (due to permissions)
- kill doesn't work
Describe the UX of the solution you'd like Something closer to @xingyaoww 's original implementation: https://github.com/xingyaoww/OpenDevin/blob/8815aa95ba770110e9d6a4839fb7f9cef01ef4d7/opendevin/sandbox/docker.py
Do you have thoughts on the technical implementation? Can we start the container, then connect an ssh or pty session?
Describe alternatives you've considered
- Hacking around
exec👎
Here's a suggestion from Slack: https://github.com/princeton-nlp/intercode
Maybe not quite the API we need, but we can take some inspiration from them at least
How do you feel about we do docker attach then uses my old implementation to read pty of that session? This can potentially be the "main session" that keeps track of cd command.
@xingyaoww that should probably work well
I would suggest enabling sshd inside the docker sandbox, and connect to the docker environment via ssh. Then we need to capture the TTY of that ssh session inside Python.
To do that, there're some python libraries that enable these kind of interactive session. Such as Pexpect (https://pexpect.readthedocs.io/en/stable/), and more specifically https://pexpect.readthedocs.io/en/stable/api/pxssh.html . The other alternative could be https://github.com/pexpect/ptyprocess.
At the very least, we can deal with interactive cli this way.
cc @neubig for input as well
@frankxu2004 thanks! Conceptually this sounds nice to me.
@frankxu2004 I like this one! This allows us to unify the entire "persistence" session and "background" sessions easily without managing all the docker sockets. This could potentially make the sandbox interface more generalizable (e.g., we can easily use other machines as sandbox as long as we can ssh into it). Feel free to PR if interested!