Frontend/Backend: Connect chat interface to agent
We now are close to having a prototype frontend design, so a natural next step is to connect the frontend to an agent.
We have an issue (#20) and PR (#35) for this, and also a prototype API design (#44) that would allow all of them to communicate.
These are not yet merged into main, but if we assume that these or something similar will be merged, then a next step would be to make it so that when we press the "send" button on the frontend chat interface, it uses the websocket API to send a message to the agent, and the agent provides a response which is displayed in the frontend.
I've been thinking more about this. I think we're going to need two websocket layers, at least the way the agent in #35 is currently set up.
The agent runs in docker for a layer of safety (wouldn't want it running rm -rf / on your host system 😅). So we'll need a way for the agent to communicate with the host system e.g. via websocket.
We could connect the frontend directly to the docker container, and this might work well for an MVP. But that would mean our API server is bundled into the docker container, and tightly coupled to the Agent. We wouldn't be able to e.g. spin up multiple agents at once. And end-users would have to start the app with docker run, rather than just starting a server and letting the server handle the docker commands.
So I'm thinking the basic architecture is:
- Frontend connects to API server via websocket
- API server spins up a docker container running the agent
- API server establishes a websocket connection with the agent
- API server forwards all frontend messages straight into the docker container
But maybe for an MVP we can just do the simple Frontend -> Docker websocket connection and call it a day.
@rbren That makes sense to me. If I'm understanding you correctly, the flow is
-
Frontend --> API server Frontend establishes a WebSocket connection.
-
API server --> Agent API server spins up an agent to handle the frontend's requests.
-
Frontend <--> API server <--> Agent Frontend and API server establish a WebSocket connection. API server and Agent establish a WebSocket connection. Requests from the frontend are forwarded by the API server to the agent.
In my opinion, it's worth writing both layers from the start. Curious to hear what others think.
Yeah, this totally makes sense, I agree!
@yimothysu exactly.
I put together a sample handshake: https://github.com/OpenDevin/OpenDevin/pull/57
Here's my latest take on the handshake: https://github.com/OpenDevin/OpenDevin/pull/97