OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

Fix server lock up on session init

Open tofarr opened this issue 1 year ago • 1 comments

Short description

This fixes the issue where waiting for an agent to start locks the main runloop, so no other http requests may be served (Even in other sessions) while an agent is starting


To test this - Imagine you are running openhands on your local computer. Imagine also that you were to run the following script in conjunction with that:

import requests
from datetime import datetime
from time import sleep

if __name__ == "__main__":
    while True:
        try:
            response = requests.get("http://localhost:3000")
            print(f"{datetime.now()} response: {response.status_code}")
        except Exception as err:
            print(err)
        sleep(5)

While your server is running, would expect your script to print out something like this:

2024-09-23 12:10:47.832113 response: 200
2024-09-23 12:10:52.839559 response: 200
2024-09-23 12:10:57.848722 response: 200
2024-09-23 12:11:02.859276 response: 200
2024-09-23 12:11:07.866603 response: 200
2024-09-23 12:11:12.870330 response: 200
...

Instead, you get long gaps between responses when the server is starting up :( e.g.: From my server log you can see big gaps between GET requests to "/":

INFO:     127.0.0.1:62260 - "GET / HTTP/1.1" 200 OK
12:10:08 - openhands:INFO: runtime.py:191 - Waiting for sandbox to be alive...
12:10:47 - openhands:INFO: runtime.py:191 - Waiting for sandbox to be alive...
12:10:47 - openhands:INFO: runtime.py:245 - Executing action
INFO:     127.0.0.1:62272 - "GET /api/list-files HTTP/1.1" 200 OK
INFO:     127.0.0.1:62298 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:62461 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:62483 - "GET / HTTP/1.1" 200 OK

This is because "Waiting for sandbox to be alive" is happening in the main run loop - effectively it blocks all other server activity until complete.

My change offloads this work onto a background thread so that the main run loop does not get locked. Once this is in place, the server log looks more like this (The requests keep being serviced even though a build is in progress):

12:14:49 - openhands:INFO: remote.py:55 - Build initiated with ID: e522856d-0c07-47e6-975a-0de61b506464
12:14:49 - openhands:INFO: remote.py:80 - Build status: QUEUED
INFO:     127.0.0.1:63563 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63586 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63608 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63630 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63655 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63677 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63701 - "GET / HTTP/1.1" 200 OK
12:15:19 - openhands:INFO: remote.py:80 - Build status: WORKING
INFO:     127.0.0.1:63727 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63751 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63773 - "GET / HTTP/1.1" 200 OK
INFO:     127.0.0.1:63796 - "GET / HTTP/1.1" 200 OK

tofarr avatar Sep 23 '24 17:09 tofarr

You could also test this by running the remote runtime with 2 different browsers at the same time: image image

tofarr avatar Sep 23 '24 21:09 tofarr