sglang icon indicating copy to clipboard operation
sglang copied to clipboard

Add Default Timeout to urllib.request.urlopen Calls to Prevent Potential Hanging

Open alessiodallapiazza opened this issue 1 year ago • 5 comments
trafficstars

The current implementation of HTTP requests in the code utilizes urllib.request.urlopen without specifying a default timeout. This approach can lead to potential hanging of the application if the server does not respond or if the network is experiencing issues.

Code Snippet:

    # add the API Key header if an API key is provided
    if api_key is not None:
        headers["X-API-Key"] = api_key

    if stream:
        return requests.post(url, json=json, stream=True, headers=headers)
    else:
        req = urllib.request.Request(url, headers=headers)
        if json is None:
            data = None
        else:
            data = bytes(dumps(json), encoding="utf-8")
        resp = urllib.request.urlopen(req, data=data, cafile=verify)
        return HttpResponse(resp)

To mitigate this risk, I propose adding an optional timeout argument to the function(s) that wrap urllib.request.urlopen calls. This argument would allow developers to specify a custom timeout, with a sensible default set to ensure that no call hangs indefinitely.

alessiodallapiazza avatar Mar 29 '24 13:03 alessiodallapiazza

@alessiodallapiazza We are welcome if you can submit a PR to add this feature.

hnyls2002 avatar Apr 07 '24 08:04 hnyls2002

I think this is a real problem. @hnyls2002 have you tried testing generation with batch size of 100 or 1000 and multi-step structured generation with connection to a remote endpoint? I have a connection to a remote LLM endpoint, batch size 57, num_threads=10 and I get an error Connection reset by peer:

Exception in thread Thread-360 (_thread_worker_func):
Traceback (most recent call last):
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 303, in _thread_worker_func
    self._execute(expr)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 341, in _execute
    self._execute_commit_lazy_operations(other)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 530, in _execute_commit_lazy_operations
    self.backend.commit_lazy_operations(self)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/backend/runtime_endpoint.py", line 76, in commit_lazy_operations
    res = http_request(
          ^^^^^^^^^^^^^
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/utils.py", line 113, in http_request
    resp = urllib.request.urlopen(req, data=data, cafile=verify)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 215, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 515, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 532, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 492, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1373, in http_open
    return self.do_open(http.client.HTTPConnection, req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1348, in do_open
    r = h.getresponse()
        ^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 1423, in getresponse
    response.begin()
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 292, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socket.py", line 707, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 54] Connection reset by peer

This makes run_batch hang and it never finishes (I have progress_bar=True and I see stuck at 56/57). I've not looked at the code yet but I suspect retry would also be missing, which is needed.

Maybe it could be considered for a run_batch or sglang backend instance to have a single socket connection to a remote endpoint?

Gintasz avatar May 09 '24 16:05 Gintasz

I think this is a real problem. @hnyls2002 have you tried testing generation with batch size of 100 or 1000 and multi-step structured generation with connection to a remote endpoint? I have a connection to a remote LLM endpoint, batch size 57, num_threads=10 and I get an error Connection reset by peer:

Exception in thread Thread-360 (_thread_worker_func):
Traceback (most recent call last):
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 303, in _thread_worker_func
    self._execute(expr)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 341, in _execute
    self._execute_commit_lazy_operations(other)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 530, in _execute_commit_lazy_operations
    self.backend.commit_lazy_operations(self)
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/backend/runtime_endpoint.py", line 76, in commit_lazy_operations
    res = http_request(
          ^^^^^^^^^^^^^
  File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/utils.py", line 113, in http_request
    resp = urllib.request.urlopen(req, data=data, cafile=verify)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 215, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 515, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 532, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 492, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1373, in http_open
    return self.do_open(http.client.HTTPConnection, req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1348, in do_open
    r = h.getresponse()
        ^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 1423, in getresponse
    response.begin()
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 292, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socket.py", line 707, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 54] Connection reset by peer

This makes run_batch hang and it never finishes (I have progress_bar=True and I see stuck at 56/57). I've not looked at the code yet but I suspect retry would also be missing, which is needed.

Maybe it could be considered for a run_batch or sglang backend instance to have a single socket connection to a remote endpoint?

I guess what I was facing is similar to yours. I am currently running SGL on multiple machines to infer ~1 million prompts in a data parallel manner. However, I've noticed that it is easy for some SGL backends to hang indefinitely. I was confused and thought there's a deadlock issue until I saw this post.

m0g1cian avatar Jul 21 '24 03:07 m0g1cian

@m0g1cian I had solved with this retry logic https://github.com/sgl-project/sglang/pull/424

Gintasz avatar Jul 21 '24 13:07 Gintasz

Same problem with sglang 0.2.13

alanxmay avatar Aug 21 '24 03:08 alanxmay

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

github-actions[bot] avatar Oct 21 '24 01:10 github-actions[bot]

I am facing the same issue, running my SGLANG BE on 6xH100 GPUs, on during peak traffic SGLANG BE is getting hanged leading to high latency.

Has anyone solved this, any work around to set timeout?

Rajansethi26 avatar Jan 06 '25 11:01 Rajansethi26