Experimental: Unified external interface
This is soo far in draft territory it is not even funny. It is nowhere near working or production ready. (Also, nowhere near tested). I think OpenHands founder feedback as well as community feedback is warranted. Even if we don't fully implement it, I'd like to think it will start a conversation and at least be partly implemented.
A Possible Refactor
The Issue
Our API evolved over time - as a result there are things that could work more cleanly. Much of it was designed with a single user running something on their own computer. SAAS constraints were not considered as part of this, nor was threading or cancelling / interrupting tasks, or paging data. This slows down development, overcomplicates solutions, and leads to a poor user experience.
The Broad Approach
Suppose our external API was based around Conversations. This Very close to our existing Session , but session is such an overloaded term that I think we need something more distinct (We have 2 types of session in our codebase so far - as well as the standard implementations). A conversation consists of:
-
Input tasks- things we are asking or have asked OpenHands to do. -
Output events- responses from OpenHands -
Storage- (Workspace) the current set of files to which OpenHands has access related to the conversation.
Much like a human conversation, and OpenHands Conversation is roughly linear, but can have multiple participants - there can be 0 - N websockets connected to a conversation. Input tasks can occur in any order, and we need to accommodate the situation where a participant was not listening at the time when an event occurred on the server.
The API
POST /conversation - begin a conversation
GET /conversation - list conversations (Admin Only)
GET /conversation-count - count conversations (Admin Only)
GET /conversation/{conversation_id} - get conversation info
DELETE /conversation/{conversation_id} - finish a conversation
GET /conversation/{conversation_id}/event - list conversation events
POST /conversation/{conversation_id}/event - trigger a conversation event
GET /conversation/{conversation_id}/event/{event_id} - get a conversation event
GET /conversation/{conversation_id}/task - list conversation tasks
GET /conversation/{conversation_id}/task/{task_id} - get a conversation task
POST /conversation/{conversation_id}/task - create a conversation task
DELETE /conversation/{conversation_id}/task/{task_id} - cancel a conversation task
POST /conversation/{conversation_id}/dir/{path} - create a new directory
POST /conversation/{conversation_id}/file/{path} - create a new file (touch)
POST /conversation/{conversation_id}/upload/{parent_path} - upload a set of files
DELETE /conversation/{conversation_id}/file/{path} - delete
GET /conversation/{conversation_id}/file-content/{path}
GET /conversation/{conversation_id}/file/{path}
GET /conversation/{conversation_id}/file-search
GET /conversation/{conversation_id}/file-count
GET /conversation/{conversation_id}/agent-info
WS /conversation/{conversation_id} - connect to an existing conversation via websocket
WS /conversation/ - create a new conversation and connect to it via via websocket
WS /firehose/ - get all events in all conversations. (Admin Only)
This API is really a thin veneer over the python API, and so should roughly usable in the CLI / Headless offerings.
SAASy Features
IDs
All id's in this system are UUIDs. Some security is based on the fact that you can't get the UUID for a conversation unless you are an admin, or the person who created it.
Pagination
All search operations are paginated - they accept an optional page id and return a collection of results with a "next_page_id" when there are more results available. Depending on the context, they also offer other filter options.
Better Use of OpenAPI and AsyncAPI
We use FastAPI to generate rich openapi documentation for this, and will a few tweaks we can also use AsyncAPI to document the websockets.
Tasks are cancellable
An api is provided up front for tasks (with a fallback to asyncio’s Task.cancel()). The intent is that each task should be a good citizen and share responsibly.
Asyncio All The Way
We fully embrace AsyncIO for tasks at least at the external level. A lot of tasks should use httpx rather than requests. Those that do not / cannot, should use await loop.run_in_executor(coro) to run tasks in the default thread pool executor
Rich Reusability
This api is implementation agnostic. There is no specific mention of threads, remote runtimes, docker, kubernetes or others, but all are possible. (For example, the docker implementation is likely to be the asyncio implementation running inside a docker container).
Progress so far...
Over the weekend I created this (really rough) implementation of this api with asyncio. I put all files in a new package - oh. (You'd have to run this with poetry run uvicorn openhands.server.listen:app --reload) I have yet to write a docker conversation implementation, implement authentication, integrate with our agents, or implement any of the constraints in our API. I really just wanted to see how long it would take and whether the result would be clean - I am actually pleased with the result so far
Here is an example of the Debug "Ticker" running (It just gives a status update for a dummy task every second)
const websocket = new WebSocket(`/conversation`)
websocket.onmessage = (message) => {
const event = JSON.parse(message.data)
const { detail } = event
if (detail.type == "ConversationStatusUpdate" && detail.status == "READY") {
fetch(`/conversation/${detail.conversation_id}/task`, {
method: "POST",
headers: {
'Accept': 'application/json',
'Content-Type': 'application/json'
},
body: JSON.stringify({
runnable: { type: "Ticker"},
title: "Tickety Tock!"
})
})
}
}
Or even:
const websocket = new WebSocket(`/conversation`)
websocket.onmessage = (message) => {
const event = JSON.parse(message.data)
const { detail } = event
if (detail.type == "ConversationStatusUpdate" && detail.status == "READY") {
websocket.send(JSON.stringify({
runnable: { type: "Ticker"},
title: "Tickety Tock!"
}))
}
}