OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

feat: websocket connection management and sandbox bound to session.

Open iFurySt opened this issue 1 year ago β€’ 11 comments

this PR includes:

  1. FE support reconnecting the WS after closing or refreshing the page.
  2. add /auth to get a JWT token for the server identifies the client, mainly use the session for now.
  3. the server doesn't restart the sandbox every time when the session is init, so reuse the previous container based on the session id.

iFurySt avatar Apr 02 '24 09:04 iFurySt

I had a bit of trouble using this. I refrsehed the page mid-task, and it caused the server to crash with:

AgentFinishAction(action='finish')
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/uvicorn/protocols/websockets/websockets_impl.py", line 240, in run_asgi
    result = await self.app(self.scope, self.asgi_receive, self.asgi_send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/middleware/errors.py", line 151, in __call__
    await self.app(scope, receive, send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/middleware/cors.py", line 75, in __call__
    await self.app(scope, receive, send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/routing.py", line 375, in handle
    await self.app(scope, receive, send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/routing.py", line 98, in app
    await wrap_app_handling_exceptions(app, session)(scope, receive, send)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/routing.py", line 96, in app
    await func(session)
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/fastapi/routing.py", line 348, in app
    await dependant.call(**values)
  File "/home/rbren/git/opendevin/opendevin/server/listen.py", line 35, in websocket_endpoint
    await session.start_listening()
  File "/home/rbren/git/opendevin/opendevin/server/session.py", line 107, in start_listening
    data = await self.websocket.receive_json()
  File "/home/rbren/.local/share/virtualenvs/opendevin-0fQgowZe/lib/python3.10/site-packages/starlette/websockets.py", line 135, in receive_json
    raise RuntimeError(
RuntimeError: WebSocket is not connected. Need to call "accept" first.

rbren avatar Apr 02 '24 11:04 rbren

Here's the behavior I'd hope for:

  • If at any point, I refresh the page, all the state pops back into place. E.g. my message history is all there, the command-line state, etc. The current task is still running and outputting messages.
    • do we just have the server send the entire history as messages?
    • if we do, how do we stop them all from printing out slowly (e.g. do we override the typewriter functionality?)
  • if the websocket disconnects, e.g. due to a bad internet connection, it picks back up seamlessly
    • this might be hard--the server would need to keep track of which items in the history had been sent successfully
  • As a more near-term goal, losing the history but seeing the task still in-progress would be nice.

rbren avatar Apr 02 '24 11:04 rbren

okay, for these goals, i need to decouple the WS connection management and Agent controller.

  1. every time the agent sends messages to FE, it will get the latest WS conn and save msg to the msg stack, if success will mark the msg.
  2. FE will also save msg to the msg stack. if the page is refreshed, FE can send the latest msg id to BE to get all msgs after that. like this 1

wdyt?

iFurySt avatar Apr 02 '24 13:04 iFurySt

btw, i think the Agent needs to be stopped if the client disconnects for a specified period of time. resume when the client reconnects?

iFurySt avatar Apr 02 '24 13:04 iFurySt

btw, i think the Agent needs to be stopped if the client disconnects for a specified period of time. resume when the client reconnects?

This seems like a good feature, but maybe a follow-on. We don't have the ability to pause and resume the agent controller loop just yet

Overall plan looks great to me though!

rbren avatar Apr 02 '24 22:04 rbren

https://github.com/OpenDevin/OpenDevin/assets/16201837/48b3bfc2-5b4f-4d55-986f-58de7b997758

The screen recording is above. modified includes:

  1. abstract session manager and agent manager, origin from session.py
  2. cache sessions and messages in local(./cache) in server quit.
  3. the terminal in FE gets messages from the store.
  4. refine the socket module to auto-reconnect.
  5. add the warning to the user, and let the user decide to load the previous session. (it can be improved in the future for multi-projects/panels)

there is an obvious problem with asyncio. i can't solve it cuz i don't very familiar with asyncio. the error is below:

^CReceived signal 2, exiting...
ERROR:    Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/OpenDevin/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/OpenDevin/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run
  File "uvloop/handles/poll.pyx", line 216, in uvloop.loop.__on_uvpoll_event
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 66, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 397, in uvloop.loop.Loop._read_from_self
  File "uvloop/loop.pyx", line 402, in uvloop.loop.Loop._invoke_signals
  File "uvloop/loop.pyx", line 377, in uvloop.loop.Loop._ceval_process_signals
  File "/Users/ifuryst/projects/ai/OpenDevin/opendevin/server/session/manager.py", line 45, in handle_signal
    exit(0)
  File "<frozen _sitebuiltins>", line 26, in __call__
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/OpenDevin/lib/python3.12/site-packages/starlette/routing.py", line 743, in lifespan
    await receive()
  File "/opt/homebrew/Caskroom/miniconda/base/envs/OpenDevin/lib/python3.12/site-packages/uvicorn/lifespan/on.py", line 137, in receive
    return await self.receive_queue.get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/OpenDevin/lib/python3.12/asyncio/queues.py", line 158, in get
    await getter
asyncio.exceptions.CancelledError

just start the server and Ctrl+C to trigger. it seems not fatal except too annoying ..

iFurySt avatar Apr 03 '24 10:04 iFurySt

This is looking awesome!

Sorry for all the merge conflicts 😬 will try and get this one in once it's rebased

rbren avatar Apr 04 '24 02:04 rbren

This is looking awesome!

Sorry for all the merge conflicts 😬 will try and get this one in once it's rebased

okay, let me resolve the conflicts.

iFurySt avatar Apr 04 '24 04:04 iFurySt

On a fresh install, I'm seeing empty initialize events sent from the FE. This is causing the server to crash with

  File "/home/rbren/git/opendevin/opendevin/server/session/manager.py", line 37, in loop_recv
    await self._sessions[sid].loop_recv(dispatch)
  File "/home/rbren/git/opendevin/opendevin/server/session/session.py", line 33, in loop_recv
    await dispatch(action, data)
  File "/home/rbren/git/opendevin/opendevin/server/agent/manager.py", line 74, in dispatch
    await self.create_controller(data)
  File "/home/rbren/git/opendevin/opendevin/server/agent/manager.py", line 116, in create_controller
    os.makedirs(directory)
  File "<frozen os>", line 225, in makedirs
FileNotFoundError: [Errno 2] No such file or directory: ''

Are you able to repro? running localStorage.clear() might help

rbren avatar Apr 04 '24 14:04 rbren

i repro it, i'm fixing it.

iFurySt avatar Apr 04 '24 15:04 iFurySt

be fixed.

iFurySt avatar Apr 04 '24 15:04 iFurySt

Sorry looks like another rough merge 😬

rbren avatar Apr 05 '24 03:04 rbren

solve it laterπŸ˜Άβ€πŸŒ«οΈ

iFurySt avatar Apr 05 '24 03:04 iFurySt

caught up with the main branch

iFurySt avatar Apr 05 '24 05:04 iFurySt

Testing now!

rbren avatar Apr 05 '24 14:04 rbren

Tested this out. There are some edge cases, but reconnection works perfectly if I turn my wifi off mid-session.

This is an awesome improvement. Let's get it in!

rbren avatar Apr 05 '24 17:04 rbren

This may have introduced a regression by falling back to "gpt-3.5-turbo-1106" model (if one not given in the UI). The regression would be that the backend no longer respects the paramters in config.toml that allow Ollama to work

Error condensing thoughts: No healthy deployment available, passed model=gpt-3.5-turbo-1106

lowlyocean avatar Apr 05 '24 22:04 lowlyocean

πŸ‘ thanks for filing an issue!

rbren avatar Apr 06 '24 00:04 rbren