Experimental: Unified external interface

Open tofarr opened this issue 1 year ago • 0 comments

This is soo far in draft territory it is not even funny. It is nowhere near working or production ready. (Also, nowhere near tested). I think OpenHands founder feedback as well as community feedback is warranted. Even if we don't fully implement it, I'd like to think it will start a conversation and at least be partly implemented.

A Possible Refactor

The Issue

Our API evolved over time - as a result there are things that could work more cleanly. Much of it was designed with a single user running something on their own computer. SAAS constraints were not considered as part of this, nor was threading or cancelling / interrupting tasks, or paging data. This slows down development, overcomplicates solutions, and leads to a poor user experience.

The Broad Approach

Suppose our external API was based around Conversations. This Very close to our existing Session , but session is such an overloaded term that I think we need something more distinct (We have 2 types of session in our codebase so far - as well as the standard implementations). A conversation consists of:

Input tasks - things we are asking or have asked OpenHands to do.
Output events - responses from OpenHands
Storage - (Workspace) the current set of files to which OpenHands has access related to the conversation.

Much like a human conversation, and OpenHands Conversation is roughly linear, but can have multiple participants - there can be 0 - N websockets connected to a conversation. Input tasks can occur in any order, and we need to accommodate the situation where a participant was not listening at the time when an event occurred on the server.

The API

    POST   /conversation  - begin a conversation
    GET    /conversation  - list conversations (Admin Only)
    GET    /conversation-count  - count conversations (Admin Only)
    GET    /conversation/{conversation_id}  - get conversation info
    DELETE /conversation/{conversation_id}  - finish a conversation
    GET    /conversation/{conversation_id}/event  - list conversation events
    POST   /conversation/{conversation_id}/event  - trigger a conversation event
    GET    /conversation/{conversation_id}/event/{event_id}  - get a conversation event
    GET    /conversation/{conversation_id}/task  - list conversation tasks
    GET    /conversation/{conversation_id}/task/{task_id}  - get a conversation task
    POST   /conversation/{conversation_id}/task  - create a conversation task
    DELETE /conversation/{conversation_id}/task/{task_id}  - cancel a conversation task
    POST   /conversation/{conversation_id}/dir/{path}  - create a new directory
    POST   /conversation/{conversation_id}/file/{path}  - create a new file (touch)
    POST   /conversation/{conversation_id}/upload/{parent_path}  - upload a set of files
    DELETE /conversation/{conversation_id}/file/{path}  - delete
    GET    /conversation/{conversation_id}/file-content/{path}
    GET    /conversation/{conversation_id}/file/{path}
    GET    /conversation/{conversation_id}/file-search
    GET    /conversation/{conversation_id}/file-count
    GET    /conversation/{conversation_id}/agent-info
    WS     /conversation/{conversation_id}  - connect to an existing conversation via websocket
    WS     /conversation/  - create a new conversation and connect to it via via websocket
    WS     /firehose/  - get all events in all conversations. (Admin Only)

This API is really a thin veneer over the python API, and so should roughly usable in the CLI / Headless offerings.

SAASy Features

IDs

All id's in this system are UUIDs. Some security is based on the fact that you can't get the UUID for a conversation unless you are an admin, or the person who created it.

Pagination

All search operations are paginated - they accept an optional page id and return a collection of results with a "next_page_id" when there are more results available. Depending on the context, they also offer other filter options.

Better Use of OpenAPI and AsyncAPI

We use FastAPI to generate rich openapi documentation for this, and will a few tweaks we can also use AsyncAPI to document the websockets.

Tasks are cancellable

An api is provided up front for tasks (with a fallback to asyncio’s Task.cancel()). The intent is that each task should be a good citizen and share responsibly.

Asyncio All The Way

We fully embrace AsyncIO for tasks at least at the external level. A lot of tasks should use httpx rather than requests. Those that do not / cannot, should use await loop.run_in_executor(coro) to run tasks in the default thread pool executor

Rich Reusability

This api is implementation agnostic. There is no specific mention of threads, remote runtimes, docker, kubernetes or others, but all are possible. (For example, the docker implementation is likely to be the asyncio implementation running inside a docker container).

Progress so far...

Over the weekend I created this (really rough) implementation of this api with asyncio. I put all files in a new package - oh. (You'd have to run this with poetry run uvicorn openhands.server.listen:app --reload) I have yet to write a docker conversation implementation, implement authentication, integrate with our agents, or implement any of the constraints in our API. I really just wanted to see how long it would take and whether the result would be clean - I am actually pleased with the result so far

Here is an example of the Debug "Ticker" running (It just gives a status update for a dummy task every second)

const websocket = new WebSocket(`/conversation`)
websocket.onmessage = (message) => {
    const event = JSON.parse(message.data)
    const { detail } = event
    if (detail.type == "ConversationStatusUpdate" && detail.status == "READY") {
        fetch(`/conversation/${detail.conversation_id}/task`, {
            method: "POST",
            headers: {
            'Accept': 'application/json',
            'Content-Type': 'application/json'
            },
            body: JSON.stringify({
                runnable: { type: "Ticker"},
                title: "Tickety Tock!"
            })
        })
    }
}

Or even:

const websocket = new WebSocket(`/conversation`)
websocket.onmessage = (message) => {
    const event = JSON.parse(message.data)
    const { detail } = event
    if (detail.type == "ConversationStatusUpdate" && detail.status == "READY") {
        websocket.send(JSON.stringify({
            runnable: { type: "Ticker"},
            title: "Tickety Tock!"
        }))
    }
}

Oct 01 '24 18:10 tofarr