[Feature] turbomind后端是否会支持guided_decoding
Motivation
turbomind后端是否会支持guided_decoding
Related resources
No response
Additional context
No response
同问
我们先评估下整体工作量,再来和大家同步
组内现在在优先处理团队内部的一些需求,对这个feature的支持会延后
这个feature 什么时候可以支持呀
怎么也得春节后了。。。
节后什么时候支持呀
很抱歉,变化太快,组内现在没有人手来处理这个需求了
guided_decoding 是一个非常好用的 Feature, 建议优先支持
很抱歉,现在组内的工作最高优先级是支持团队内的需求,guided decoding 暂排不上
@CUHKSZzxy may put it into your work list
@lvhan028 很多agent框架(比如langchain)都需要使用 guided_decoding技术 作为构建agent的一部分,因此再次建议 尽早支持一下 guided_decoding,不然在使用agent框架的时候,只能使用vllm作为推理后端才能 支持。
pytorch engine 支持,能不能先用它。tm 这边因为人手原因,还没有排上这个feature
最近有计划吗,这个feature 看似还是很容易实现的,而且是很重要的feature
@lvhan028 这个feature排上计划了吗 ,我觉得性能再好,也不如某些feature在业务上的重要性。而且这个feature看似很容易实现。
最近没空处理社区提的feature,非常欢迎社区大佬们给 lmdeploy 提 PR 支持
求支持啊
@lvhan028 这个feature排上计划了吗 ,我觉得性能再好,也不如某些feature在业务上的重要性。而且这个feature看似很容易实现。
@shell-nlp 抱歉,团队一直处于工作量高度饱和的状态,实在分身乏术。 vllm, sglang等优秀的开源库有这个功能,可以用起来。 如果容易实现的话,可以自己动手实现看看。如果能给 lmdeploy PR这个功能,我们非常非常欢迎,深感荣幸。
你好,我想尝试区支持在turbomind backend下支持guided_decoding,这边可以提供一些指导吗? 我这边初步看了下,通过纯python code 的改动,貌似很难去支持这个feature。
😁再来催一催
和 @irexyc 说好了。 这把真的排上,拉钩。
Any WIP guy.
Any WIP guy.
You can try PR #3965 and follow the progress there. Thanks.
You can try PR #3965 and follow the progress there. Thanks.
Thank you for your work. Do you plan to implement response_format for the OpenAI API Server after?
You can try PR #3965 and follow the progress there. Thanks.
Thank you for your work. Do you plan to implement response_format for the OpenAI API Server after?
Thank you for your support and patience. #3965 is just a beginning to better support guided decoding in LMDeploy. We will see if it has enough quality to be merged. And after that, I think we still have some work to optimize the performance. For OpenAI style guided decoding, frankly speaking, we have not discussed on it.
If you have interests to implement it, I believe we will all be happy to accept it!
You can try PR #3965 and follow the progress there. Thanks.
Thank you for your work. Do you plan to implement response_format for the OpenAI API Server after?
After some dig-ups, I think I have enabled response_format for the OpenAI API Server in the last commit of the PR? Maybe you can have a try?
After some dig-ups, I think I have enabled response_format for the OpenAI API Server in the last commit of the PR? Maybe you can have a try?
2025-09-30 15:27:32,716 - lmdeploy - ERROR - async_engine.py:663 - [safe_run] exception caught: KeyError 'schema'
2025-09-30 15:27:32,716 - lmdeploy - ERROR - async_engine.py:648 - [model_inst] exception caught: 'schema'
INFO: 10.69.1.103:56928 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/py3/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/opt/py3/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
File "/opt/py3/lib/python3.10/site-packages/fastapi/applications.py", line 1133, in __call__
await super().__call__(scope, receive, send)
File "/opt/py3/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/py3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/opt/py3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/opt/py3/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/opt/py3/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/py3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/opt/py3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/opt/py3/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/opt/py3/lib/python3.10/site-packages/starlette/routing.py", line 716, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/py3/lib/python3.10/site-packages/starlette/routing.py", line 736, in app
await route.handle(scope, receive, send)
File "/opt/py3/lib/python3.10/site-packages/starlette/routing.py", line 290, in handle
await self.app(scope, receive, send)
File "/opt/py3/lib/python3.10/site-packages/fastapi/routing.py", line 123, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/opt/py3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/opt/py3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/opt/py3/lib/python3.10/site-packages/fastapi/routing.py", line 109, in app
response = await f(request)
File "/opt/py3/lib/python3.10/site-packages/fastapi/routing.py", line 387, in app
raw_response = await run_endpoint_function(
File "/opt/py3/lib/python3.10/site-packages/fastapi/routing.py", line 288, in run_endpoint_function
return await dependant.call(**values)
File "/opt/py3/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 586, in chat_completions_v1
if final_res.finish_reason == 'stop' and len(message.tool_calls) > 0:
AttributeError: 'NoneType' object has no attribute 'finish_reason'
Curl work but when try with python openai lib, lmdeploy raise 500 error. Above is logs.
schema
Can I have your test code? I need some more info to debug. Thank you.
After some dig-ups, I think I have enabled response_format for the OpenAI API Server in the last commit of the PR? Maybe you can have a try?
2025-09-30 15:27:32,716 - lmdeploy - ERROR - async_engine.py:663 - [safe_run] exception caught: KeyError 'schema' 2025-09-30 15:27:32,716 - lmdeploy - ERROR - async_engine.py:648 - [model_inst] exception caught: 'schema' INFO: 10.69.1.103:56928 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/opt/py3/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi result = await app( # type: ignore[func-returns-value] File "/opt/py3/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__ return await self.app(scope, receive, send) File "/opt/py3/lib/python3.10/site-packages/fastapi/applications.py", line 1133, in __call__ await super().__call__(scope, receive, send) File "/opt/py3/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__ await self.middleware_stack(scope, receive, send) File "/opt/py3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/opt/py3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/opt/py3/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__ await self.app(scope, receive, send) File "/opt/py3/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 63, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/opt/py3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/opt/py3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/opt/py3/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__ await self.app(scope, receive, send) File "/opt/py3/lib/python3.10/site-packages/starlette/routing.py", line 716, in __call__ await self.middleware_stack(scope, receive, send) File "/opt/py3/lib/python3.10/site-packages/starlette/routing.py", line 736, in app await route.handle(scope, receive, send) File "/opt/py3/lib/python3.10/site-packages/starlette/routing.py", line 290, in handle await self.app(scope, receive, send) File "/opt/py3/lib/python3.10/site-packages/fastapi/routing.py", line 123, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/opt/py3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app raise exc File "/opt/py3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app await app(scope, receive, sender) File "/opt/py3/lib/python3.10/site-packages/fastapi/routing.py", line 109, in app response = await f(request) File "/opt/py3/lib/python3.10/site-packages/fastapi/routing.py", line 387, in app raw_response = await run_endpoint_function( File "/opt/py3/lib/python3.10/site-packages/fastapi/routing.py", line 288, in run_endpoint_function return await dependant.call(**values) File "/opt/py3/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 586, in chat_completions_v1 if final_res.finish_reason == 'stop' and len(message.tool_calls) > 0: AttributeError: 'NoneType' object has no attribute 'finish_reason'Curl work but when try with python openai lib, lmdeploy raise 500 error. Above is logs.
I have tested successfully using the following script:
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY", base_url="http://0.0.0.0:23333/v1")
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Make a self introduction please."},
],
temperature=0.8,
top_p=0.8,
response_format={
"type": "json_schema",
"json_schema": {
"name": "user_profile",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"skills": {
"type": "array",
"items": {"type": "string", "maxLength": 10},
"minItems": 3,
"maxItems": 10,
},
"work history": {
"type": "array",
"items": {
"type": "object",
"properties": {
"company": {"type": "string"},
"duration": {"type": "string"},
},
"required": ["company"],
},
},
},
"required": ["name", "skills", "work history"],
},
},
},
)
print(response)
I can get such response:
ChatCompletion(id='8', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='{"name": "Alice", "skills": ["HTML", "CSS", "JavaScript", "Python", "SQL", "Git", "Docker", "AWS", "Linux", "ReactJS"], "work history": [{"company": "Company A", "duration": "2020-2023"}, {"company": "Company B", "duration": "2023-2024"}] }', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, gen_tokens=None, reasoning_content=None))], created=1759227863, model='/home/windreamer/.cache/huggingface/hub/models--Qwen--Qwen3-0.6B/snapshots/c1899de289a04d12100db370d81485cdf75e47ca', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=89, prompt_tokens=25, total_tokens=114, completion_tokens_details=None, prompt_tokens_details=None))
import json
from typing import List
from openai import OpenAI
from pydantic import BaseModel
class StoryOutput(BaseModel):
title: str
characters: List[str]
moral: str
client = OpenAI(
base_url="",
api_key="EMPTY",
)
schema = StoryOutput.model_json_schema()
prompt = (
"Kể một câu chuyện ngắn vui nhộn về một con mèo và một con robot trong công viên."
)
resp = client.chat.completions.create(
model="",
messages=[
{
"role": "system",
"content": "Return ONLY valid JSON that matches the JSON Schema.",
},
{"role": "user", "content": prompt},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "StoryOutput",
"schema": schema,
"strict": True,
},
},
temperature=0.7,
)
content = resp.choices[0].message.content
data = json.loads(content)
story = StoryOutput.model_validate(data)
Here is my code.
import json from typing import List
from openai import OpenAI from pydantic import BaseModel
class StoryOutput(BaseModel): title: str characters: List[str] moral: str
client = OpenAI( base_url="", api_key="EMPTY", )
schema = StoryOutput.model_json_schema()
prompt = ( "Kể một câu chuyện ngắn vui nhộn về một con mèo và một con robot trong công viên." )
resp = client.chat.completions.create( model="", messages=[ { "role": "system", "content": "Return ONLY valid JSON that matches the JSON Schema.", }, {"role": "user", "content": prompt}, ], response_format={ "type": "json_schema", "json_schema": { "name": "StoryOutput", "schema": schema, "strict": True, }, }, temperature=0.7, )
content = resp.choices[0].message.content data = json.loads(content) story = StoryOutput.model_validate(data) Here is my code.
Thank you for the code for reproduce the issue. I have identify the bug and fix it in the latest commit, you can try to verify if it has been fixed completely.
This is due to the Pydantic model used by LMDeploy, as schema is reseved for Pydantic BaseModel. So we need rename the field to json_schema to avoid name confliction and set an alias of schema to make sure the json is deserialized successfully. However, we also serialize the model to json internally and the result will use json_schema instread of schema as field name and this is the root cause of the bug.
So in the latest commit of the PR, I set the model to use alias in serialization too. I belive it can solve the current issue.