sglang
sglang copied to clipboard
`RecursionError: maximum recursion depth exceeded while calling a Python object` when inferencing with long input
Hi, I ran across this issue during inference
Exception in ModelRpcClient:
Traceback (most recent call last):
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 168, in exposed_step
self.forward_step()
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 195, in forward_step
self.forward_decode_batch(self.running_batch)
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 460, in forward_decode_batch
self.handle_finished_requests(batch)
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 528, in handle_finished_requests
prefix_len = self.tree_cache.insert(
^^^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 61, in insert
return self._insert_helper(self.root_node, key, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 157, in _insert_helper
return prefix_len + self._insert_helper(child, key, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 157, in _insert_helper
return prefix_len + self._insert_helper(child, key, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 157, in _insert_helper
return prefix_len + self._insert_helper(child, key, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Previous line repeated 958 more times]
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 166, in _insert_helper
new_node = TreeNode()
^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 12, in __init__
self.children = defaultdict(TreeNode)
^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded while calling a Python object
Would it be possible to implement this logic without recursion? @merrymercy
@Ja1Zhou Of course, this logic can be implemented without recursion.
I am unsure whether there would be so many nodes in a single path in the radix tree; it's very strange to recursive near 1k times. Would please help to check if this is a dead recursion bug or provide more information about how to reproduce it?
Hi. I myself am unable to produce the same error consistently 😭. In fact I am prompted with three kinds of errors randomly.
Exception in ModelRpcClient:
Traceback (most recent call last):
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 168, in exposed_step
self.forward_step()
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 179, in forward_step
new_batch = self.get_new_fill_batch()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 293, in get_new_fill_
batch
self.token_to_kv_pool.available_size() + self.tree_cache.evictable_size()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/memory_pool.py", line 92, in available_size
return torch.sum(self.mem_state == 0).item()
^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception in ModelRpcClient:
Traceback (most recent call last):
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 168, in exposed_step
self.forward_step()
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 179, in forward_step
new_batch = self.get_new_fill_batch()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 277, in get_new_fill_batch
prefix_indices, last_node = self.tree_cache.match_prefix(req.input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 52, in match_prefix
value = torch.concat(value)
^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception in ModelRpcClient:
Traceback (most recent call last):
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 168, in exposed_step
self.forward_step()
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 195, in forward_step
self.forward_decode_batch(self.running_batch)
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 421, in forward_decode_batch
if not batch.check_decode_mem():
^^^^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/infer_batch.py", line 284, in check_decode_mem
self.tree_cache.evict(bs, self.token_to_kv_pool.free)
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 74, in evict
leaves = self._collect_leaves()
^^^^^^^^^^^^^^^^^^^^^^
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 201, in _collect_leaves
dfs_(self.root_node)
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 199, in dfs_
dfs_(x)
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 199, in dfs_
dfs_(x)
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 199, in dfs_
dfs_(x)
[Previous line repeated 959 more times]
File "/User/jay/miniconda3/envs/sglang/lib/python3.11/site-packages/sglang/srt/managers/router/radix_cache.py", line 198, in dfs_
for x in cur_node.children.values():
^^^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded while calling a Python object
Also I am using a proprietary model and perhaps it would be hard to reproduce my error.
I would really appreciate it if there is any insight as to why these errors would appear!
@Ja1Zhou Any scripts to reproduce it would help us debug. Otherwise, it is very difficult to debug with only these error messages.
Some tips:
- Try to disable tensor parallelism
- Use this function to print the tree https://github.com/sgl-project/sglang/blob/c51020cf0c64498865538362aa34baaed13a3b50/python/sglang/srt/managers/router/radix_cache.py#L63 . Can you provide us with the status of the tree when you see the error message?
I ran across the same error but at different place.
Exception in ModelRpcClient: Traceback (most recent call last):
File "/home/yangchunhao/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 176, in exposed_step self.forward_step()
File "/home/yangchunhao/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs)
File "/home/yangchunhao/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 187, in forward_step new_batch = self.get_new_fill_batch()
File "/home/yangchunhao/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 285, in get_new_fill_batch prefix_indices, last_node = self.tree_cache.match_prefix(req.input_ids)
File "/home/yangchunhao/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/managers/router/radix_cache.py", line 50, in match_prefix self._match_prefix_helper(self.root_node, key, value, last_node)
File "/home/yangchunhao/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/managers/router/radix_cache.py", line 129, in _match_prefix_helper self._match_prefix_helper(child, key[prefix_len:], value, last_node)
File "/home/yangchunhao/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/managers/router/radix_cache.py", line 129, in _match_prefix_helper self._match_prefix_helper(child, key[prefix_len:], value, last_node)
File "/home/yangchunhao/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/managers/router/radix_cache.py", line 129, in _match_prefix_helper self._match_prefix_helper(child, key[prefix_len:], value, last_node)
[Previous line repeated 979 more times]
File "/home/yangchunhao/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/managers/router/radix_cache.py", line 120, in _match_prefix_helper prefix_len = match(c_key, key)
File "/home/yangchunhao/miniconda3/envs/sglang/lib/python3.10/site-packages/sglang/srt/managers/router/radix_cache.py", line 24, in match for k, w in zip(key, seq):
RecursionError: maximum recursion depth exceeded while calling a Python object
My environment: 4090 * 2 SGLang 0.1.12 vLLM 0.3.1 Qwen1.5-14B
My script
"""
Usage:
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
python json_decode.py
"""
from enum import Enum
from typing import List, Union
import sglang as sgl
from pydantic import BaseModel
from sglang.srt.constrained import build_regex_from_object
character_regex = r"""\[[\n ]*((\{[\n ]*"债券简称"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"[\n ]*,[\n ]*"债券代码"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"[\n ]*,[\n ]*"报价方向"[\n ]*:[\n ]*(\{[\n ]*"text"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"[\n ]*,[\n ]*"choices"[\n ]*:[\n ]*("bid"|"ofr"|"double"|"unknown")[\n ]*\}|null)[\n ]*,[\n ]*"bid价格"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"bid数量"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"bid价格类型"[\n ]*:[\n ]*(\{[\n ]*"text"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"[\n ]*,[\n ]*"choices"[\n ]*:[\n ]*("1\-净价"|"3\-收益率"|"4\-利差"|"5\-意向")[\n ]*\}|null)[\n ]*,[\n ]*"bid是否请示"[\n ]*:[\n ]*(\{[\n ]*"text"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"choices"[\n ]*:[\n ]*("是"|"否")[\n ]*\}|null)[\n ]*,[\n ]*"ofr价格"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"ofr数量"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"ofr价格类型"[\n ]*:[\n ]*(\{[\n ]*"text"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"[\n ]*,[\n ]*"choices"[\n ]*:[\n ]*("1\-净价"|"3\-收益率"|"4\-利差"|"5\-意向")[\n ]*\}|null)[\n ]*,[\n ]*"ofr是否请示"[\n ]*:[\n ]*(\{[\n ]*"text"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"choices"[\n ]*:[\n ]*("是"|"否")[\n ]*\}|null)[\n ]*,[\n ]*"交易偏好描述"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*\})(,[\n ]*(\{[\n ]*"债券简称"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"[\n ]*,[\n ]*"债券代码"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"[\n ]*,[\n ]*"报价方向"[\n ]*:[\n ]*(\{[\n ]*"text"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"[\n ]*,[\n ]*"choices"[\n ]*:[\n ]*("bid"|"ofr"|"double"|"unknown")[\n ]*\}|null)[\n ]*,[\n ]*"bid价格"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"bid数量"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"bid价格类型"[\n ]*:[\n ]*(\{[\n ]*"text"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"[\n ]*,[\n ]*"choices"[\n ]*:[\n ]*("1\-净价"|"3\-收益率"|"4\-利差"|"5\-意向")[\n ]*\}|null)[\n ]*,[\n ]*"bid是否请示"[\n ]*:[\n ]*(\{[\n ]*"text"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"choices"[\n ]*:[\n ]*("是"|"否")[\n ]*\}|null)[\n ]*,[\n ]*"ofr价格"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"ofr数量"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"ofr价格类型"[\n ]*:[\n ]*(\{[\n ]*"text"[\n ]*:[\n ]*"(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"[\n ]*,[\n ]*"choices"[\n ]*:[\n ]*("1\-净价"|"3\-收益率"|"4\-利差"|"5\-意向")[\n ]*\}|null)[\n ]*,[\n ]*"ofr是否请示"[\n ]*:[\n ]*(\{[\n ]*"text"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*,[\n ]*"choices"[\n ]*:[\n ]*("是"|"否")[\n ]*\}|null)[\n ]*,[\n ]*"交易偏好描述"[\n ]*:[\n ]*("(?:[^"\\\x00-\x1f\x7f-\x9f]|\\.)*"|null)[\n ]*\})){0,})?[\n ]*\]"""
def driver_character_gen():
state = character_gen.run(name="Hermione Granger")
print(state.text())
class DirectionChoices(str, Enum):
bid = "bid"
ofr = "ofr"
double = "double"
unknown = "unknown"
class PriceChoices(str, Enum):
one = "1-净价",
three = "3-收益率",
four = "4-利差",
five = "5-意向",
class PriceType(BaseModel):
text: str
choices: PriceChoices
class TradeDirection(BaseModel):
text: str
choices: DirectionChoices
class RequestChoices(str, Enum):
yes = "是",
no = "否"
class RequestType(BaseModel):
text: Union[str, None]
choices: RequestChoices
class TradeFormat(BaseModel):
债券简称: str
债券代码: str
报价方向: Union[TradeDirection, None]
bid价格: Union[str, None]
bid数量: Union[str, None]
bid价格类型: Union[PriceType, None]
bid是否请示: Union[RequestType, None]
ofr价格: Union[str, None]
ofr数量: Union[str, None]
ofr价格类型: Union[PriceType, None]
ofr是否请示: Union[RequestType, None]
交易偏好描述: Union[str, None]
class TradeList(BaseModel):
报价信息: List[TradeFormat]
@sgl.function
def pydantic_wizard_gen(s, question):
ins = '你的任务是将一段有关债券交易的文本转换为特定的json格式。每行为一条债券交易记录,从中提取"债券简称", "债券代码", "报价方向", "bid价格", "bid数量", "bid价格类型", "bid是否请示", "ofr价格", "ofr数量", "ofr价格类型", "ofr是否请示", "交易偏好描述"并以json格式输出。下面是输入的文本:\n'
s += ins + question
s += sgl.gen(
"json_output",
max_tokens=4200,
temperature=0,
regex=character_regex, # Requires pydantic >= 2.0
)
def driver_pydantic_wizard_gen(question):
state = pydantic_wizard_gen.run(question)
print(state.text())
sgl.set_default_backend(sgl.RuntimeEndpoint("http://localhost:30000"))
input_str = '\n成都银行ofr:\n1Y+2Y 178760.SH 21悦来02 --/Ofr* --/5000 AA+ 估值:2.9058|3.6472 \n1.21Y+2Y 102101718 21新都香城MTN001 --/Ofr* --/5000 AA/AA+ 有担保 估值:3.2628|3.9477 \n2.6Y+2Y 114663.SH 23广控01 --/Ofr* --/10000 AA+ 估值:3.8265|4.5575 \n1.8Y(休1) 166500.SH 20九联01 --/Ofr* --/20000 AA+ 估值:3.639 \n2.17Y+2Y 102281770 22江津华信MTN001 --/Ofr* --/3000 AA+ 估值:3.5448|4.1318 \n2.21Y+2Y(休1) 182542.SH 22高新02 --/Ofr* --/11000 AA+ 估值:3.9224|4.7603 \n2.49Y+2Y 102282652 22九联投资MTN001 --/Ofr* --/2000 AA+ 估值:3.6805|4.2197 \n1.52Y+2Y+1Y 102282745 22乐山国资MTN001 --/Ofr* --/4000 AA+ 估值:3.1601|4.0744 \n1.8Y 032000313 20涪陵新城PPN001 --/Ofr* --/20000 AA 估值:3.9839 \n2.22Y+2Y 182525.SH 22科建02 --/Ofr* --/1000 AA+ 有担保 估值:3.5236|4.3653 \n1.76Y+2Y 194068.SH 22三江01 --/Ofr* --/5000 AA+ 估值:3.4745|4.3257 \n2.39Y+2Y 114029.SH 22长经04 --/Ofr* --/4000 AA+ 估值:3.3985|3.9355 \n1.46Y+2Y 102103111 21重庆临空MTN002 --/Ofr* --/3000 AA+ 估值:3.1301|3.7854 \n1.51Y+3Y 2080402 20金牛环绿债01 --/Ofr* --/3000 AA+ 估值:2.8779|3.3195 \n2.07Y+2Y 194887.SH 22长经02 --/Ofr* --/3000 AA+ 估值:3.285|3.8521 \n1.59Y+2Y 102280064 22空港城发MTN001 --/Ofr* --/10000 AA+ 估值:3.1887|3.8335 \n2.59Y+2Y 102380056 23兴泸MTN001 --/Ofr* --/1000 AA+ 估值:3.231|3.5799 \n2.5Y+2Y(休1) 102282693 22金牛环境MTN002 --/Ofr* --/10000 AA+ 估值:3.2157|3.5678 \n1.6Y+1Y(休1) 102380111 23乐山国资MTN001 --/Ofr* --/10000 AA+ 估值:3.1961|3.5469 \n3.19Y+2Y(休2) 2180325 21空港债01 --/Ofr* --/3000 AA+ 估值:3.5979|3.9162 \n3Y 178754.SH 21悦来01 --/Ofr* --/7000 AA+ 估值:3.649 \n1.73Y+2Y 102280426 22香城投资MTN002 --/Ofr* --/2000 AA+ 估值:3.2022|3.8349 \n2.58Y+2Y(休1) 032380023 23湖北科投PPN001 --/Ofr* --/12000 AAA 估值:3.4609|3.9769 \n1.76Y(休2) 194100.SH 22通经01 --/Ofr* --/4000 AAA 估值:3.1615 \n2.03Y+2Y 032280566 22武侯资本PPN002 --/Ofr* --/5000 AA+ 估值:3.268|3.841 \n1.57Y+2Y 032191442 21西盛投资PPN001 --/Ofr* --/5000 AA+ 估值:3.8466|4.6768 \n2.08Y+2Y 032280626 22渝隆资产PPN001 --/Ofr* --/6000 AA+ 估值:3.5403|4.1044 \n1.91Y+2Y 102281079 22成华棚改MTN001 --/Ofr* --/3000 AA+ 估值:3.054|3.4763 \n1.43Y(休2) 032101056 21南京浦口PPN003 --/Ofr* --/2000 AA+ 估值:3.3397 \n\n'
driver_pydantic_wizard_gen(input_str)
# driver_pydantic_wizard_gen(input_str)
Error can be reproduced when driver_pydantic_wizard_gen() run twice.
@Ja1Zhou Replacing the code in python/sglang/launch_server.py seems work for me. my env: sglang 0.1.12 torch 2.1.2+cu121 docker images nvcr.io/nvidia/pytorch/23.10-py3
import argparse
import sys
from sglang.srt.server import ServerArgs, launch_server
if __name__ == "__main__":
sys.setrecursionlimit(8000)
parser = argparse.ArgumentParser()
ServerArgs.add_cli_args(parser)
args = parser.parse_args()
server_args = ServerArgs.from_cli_args(args)
launch_server(server_args, None)
Hey I am having the same error. How do I relaunch the local launch_server.py afer changing launch_server.py as @DouHappy mentioned?
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.
I get this with the latest sglang:
object address : 0x150df025af80
object refcount : 4
object type : 0x151167c6a320
object type name: RecursionError
object repr : RecursionError('maximum recursion depth exceeded')
lost sys.stderr
Error in sys.excepthook:
object address : 0x15290c652da0
object refcount : 1
object type : 0x152d74054320
object type name: RecursionError
object repr : RecursionError('maximum recursion depth exceeded')
lost sys.stderr
Original exception was:
object address : 0x152c28c30160
object refcount : 3
object type : 0x152d74054320
object type name: RecursionError
object repr : RecursionError('maximum recursion depth exceeded')
lost sys.stderr
Error in sys.excepthook:
object address : 0x149b78692e60
object refcount : 1
object type : 0x149fc5f6d320
object type name: RecursionError
object repr : RecursionError('maximum recursion depth exceeded')
lost sys.stderr
Original exception was:
object address : 0x149e7acac160
object refcount : 3
object type : 0x149fc5f6d320
object type name: RecursionError
object repr : RecursionError('maximum recursion depth exceeded')
lost sys.stderr
Error in sys.excepthook:
object address : 0x15383ed22d40
object refcount : 1
object type : 0x153cb2787320
object type name: RecursionError
object repr : RecursionError('maximum recursion depth exceeded')
lost sys.stderr
Original exception was:
object address : 0x153b67344160
object refcount : 3
object type : 0x153cb2787320
object type name: RecursionError
object repr : RecursionError('maximum recursion depth exceeded')
lost sys.stderr
2024-09-26 12:29:20 | ERROR | stderr | Traceback (most recent call last):
2024-09-26 12:29:20 | ERROR | stderr | File "/p/project/ccstao/cstao05/FastChat/fastchat/serve/sglang_worker.py", line 290, in <module>
2024-09-26 12:29:20 | ERROR | stderr | runtime = sgl.Runtime(
2024-09-26 12:29:20 | ERROR | stderr | ^^^^^^^^^^^^
2024-09-26 12:29:20 | ERROR | stderr | File "/p/project1/ccstao/cstao05/FastChat/sc_venv_jureca/venv/lib/python3.11/site-packages/sglang/api.py", line 40, in Runtime
2024-09-26 12:29:20 | ERROR | stderr | return Runtime(*args, **kwargs)
2024-09-26 12:29:20 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-26 12:29:20 | ERROR | stderr | File "/p/project1/ccstao/cstao05/FastChat/sc_venv_jureca/venv/lib/python3.11/site-packages/sglang/srt/server.py", line 553, in __init__
2024-09-26 12:29:20 | ERROR | stderr | raise RuntimeError(
2024-09-26 12:29:20 | ERROR | stderr | RuntimeError: Initialization failed. Please see the error messages above.
[rank3]:[W926 12:29:21.712860341 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
[rank1]:[W926 12:29:21.712912191 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
[rank2]:[W926 12:29:21.965858408 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
/p/software/jurecadc/stages/2024/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
srun: error: jrc0911: task 0: Exited with exit code 1
This happens with all models I tested: Mistral-8x22, Phi-3.5 and Mistral-Mamba (the last two are not working on the model_worker or vllm of fastChat, so I tried sglang)
if __name__ == "__main__": sys.setrecursionlimit(8000) parser = argparse.ArgumentParser() ServerArgs.add_cli_args(parser) args = parser.parse_args() server_args = ServerArgs.from_cli_args(args) launch_server(server_args, None)
Tried, I still get the same recursion error, even if I set it to 100.