OpenHands Frontend/Backend: Connect chat interface to agent

We now are close to having a prototype frontend design, so a natural next step is to connect the frontend to an agent.

We have an issue (#20) and PR (#35) for this, and also a prototype API design (#44) that would allow all of them to communicate.

These are not yet merged into main, but if we assume that these or something similar will be merged, then a next step would be to make it so that when we press the "send" button on the frontend chat interface, it uses the websocket API to send a message to the agent, and the agent provides a response which is displayed in the frontend.

Mar 19 '24 12:03 neubig

I've been thinking more about this. I think we're going to need two websocket layers, at least the way the agent in #35 is currently set up.

The agent runs in docker for a layer of safety (wouldn't want it running rm -rf / on your host system 😅). So we'll need a way for the agent to communicate with the host system e.g. via websocket.

We could connect the frontend directly to the docker container, and this might work well for an MVP. But that would mean our API server is bundled into the docker container, and tightly coupled to the Agent. We wouldn't be able to e.g. spin up multiple agents at once. And end-users would have to start the app with docker run, rather than just starting a server and letting the server handle the docker commands.

So I'm thinking the basic architecture is:

Frontend connects to API server via websocket
API server spins up a docker container running the agent
API server establishes a websocket connection with the agent
API server forwards all frontend messages straight into the docker container

But maybe for an MVP we can just do the simple Frontend -> Docker websocket connection and call it a day.

Mar 19 '24 13:03 rbren

@rbren That makes sense to me. If I'm understanding you correctly, the flow is

Frontend --> API server Frontend establishes a WebSocket connection.
API server --> Agent API server spins up an agent to handle the frontend's requests.
Frontend <--> API server <--> Agent Frontend and API server establish a WebSocket connection. API server and Agent establish a WebSocket connection. Requests from the frontend are forwarded by the API server to the agent.

In my opinion, it's worth writing both layers from the start. Curious to hear what others think.

Mar 19 '24 14:03 yimothysu

Yeah, this totally makes sense, I agree!

Mar 19 '24 15:03 neubig

@yimothysu exactly.

I put together a sample handshake: https://github.com/OpenDevin/OpenDevin/pull/57

Mar 19 '24 21:03 rbren

Here's my latest take on the handshake: https://github.com/OpenDevin/OpenDevin/pull/97

Mar 23 '24 21:03 rbren