sglang
sglang copied to clipboard
Add Default Timeout to urllib.request.urlopen Calls to Prevent Potential Hanging
The current implementation of HTTP requests in the code utilizes urllib.request.urlopen without specifying a default timeout. This approach can lead to potential hanging of the application if the server does not respond or if the network is experiencing issues.
Code Snippet:
# add the API Key header if an API key is provided
if api_key is not None:
headers["X-API-Key"] = api_key
if stream:
return requests.post(url, json=json, stream=True, headers=headers)
else:
req = urllib.request.Request(url, headers=headers)
if json is None:
data = None
else:
data = bytes(dumps(json), encoding="utf-8")
resp = urllib.request.urlopen(req, data=data, cafile=verify)
return HttpResponse(resp)
To mitigate this risk, I propose adding an optional timeout argument to the function(s) that wrap urllib.request.urlopen calls. This argument would allow developers to specify a custom timeout, with a sensible default set to ensure that no call hangs indefinitely.
@alessiodallapiazza We are welcome if you can submit a PR to add this feature.
I think this is a real problem. @hnyls2002 have you tried testing generation with batch size of 100 or 1000 and multi-step structured generation with connection to a remote endpoint? I have a connection to a remote LLM endpoint, batch size 57, num_threads=10 and I get an error Connection reset by peer:
Exception in thread Thread-360 (_thread_worker_func):
Traceback (most recent call last):
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1010, in run
self._target(*self._args, **self._kwargs)
File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 303, in _thread_worker_func
self._execute(expr)
File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 341, in _execute
self._execute_commit_lazy_operations(other)
File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 530, in _execute_commit_lazy_operations
self.backend.commit_lazy_operations(self)
File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/backend/runtime_endpoint.py", line 76, in commit_lazy_operations
res = http_request(
^^^^^^^^^^^^^
File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/utils.py", line 113, in http_request
resp = urllib.request.urlopen(req, data=data, cafile=verify)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 215, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 515, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 532, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 492, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1373, in http_open
return self.do_open(http.client.HTTPConnection, req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1348, in do_open
r = h.getresponse()
^^^^^^^^^^^^^^^
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 1423, in getresponse
response.begin()
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 331, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 292, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socket.py", line 707, in readinto
return self._sock.recv_into(b)
^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 54] Connection reset by peer
This makes run_batch hang and it never finishes (I have progress_bar=True and I see stuck at 56/57). I've not looked at the code yet but I suspect retry would also be missing, which is needed.
Maybe it could be considered for a run_batch or sglang backend instance to have a single socket connection to a remote endpoint?
I think this is a real problem. @hnyls2002 have you tried testing generation with batch size of 100 or 1000 and multi-step structured generation with connection to a remote endpoint? I have a connection to a remote LLM endpoint, batch size 57,
num_threads=10and I get an errorConnection reset by peer:Exception in thread Thread-360 (_thread_worker_func): Traceback (most recent call last): File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1073, in _bootstrap_inner self.run() File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1010, in run self._target(*self._args, **self._kwargs) File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 303, in _thread_worker_func self._execute(expr) File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 341, in _execute self._execute_commit_lazy_operations(other) File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/lang/interpreter.py", line 530, in _execute_commit_lazy_operations self.backend.commit_lazy_operations(self) File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/backend/runtime_endpoint.py", line 76, in commit_lazy_operations res = http_request( ^^^^^^^^^^^^^ File "/Users/gintas/Documents/PycharmProjects/sglang-baigiamasis/.venv/lib/python3.12/site-packages/sglang/utils.py", line 113, in http_request resp = urllib.request.urlopen(req, data=data, cafile=verify) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 215, in urlopen return opener.open(url, data, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 515, in open response = self._open(req, data) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 532, in _open result = self._call_chain(self.handle_open, protocol, protocol + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 492, in _call_chain result = func(*args) ^^^^^^^^^^^ File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1373, in http_open return self.do_open(http.client.HTTPConnection, req) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/urllib/request.py", line 1348, in do_open r = h.getresponse() ^^^^^^^^^^^^^^^ File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 1423, in getresponse response.begin() File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 331, in begin version, status, reason = self._read_status() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/client.py", line 292, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socket.py", line 707, in readinto return self._sock.recv_into(b) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 54] Connection reset by peerThis makes
run_batchhang and it never finishes (I haveprogress_bar=Trueand I see stuck at 56/57). I've not looked at the code yet but I suspect retry would also be missing, which is needed.Maybe it could be considered for a
run_batchor sglang backend instance to have a single socket connection to a remote endpoint?
I guess what I was facing is similar to yours. I am currently running SGL on multiple machines to infer ~1 million prompts in a data parallel manner. However, I've noticed that it is easy for some SGL backends to hang indefinitely. I was confused and thought there's a deadlock issue until I saw this post.
@m0g1cian I had solved with this retry logic https://github.com/sgl-project/sglang/pull/424
Same problem with sglang 0.2.13
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.
I am facing the same issue, running my SGLANG BE on 6xH100 GPUs, on during peak traffic SGLANG BE is getting hanged leading to high latency.
Has anyone solved this, any work around to set timeout?