inference
inference copied to clipboard
Align Handling of reasoning_content and content for DeepSeek R1 Models with Official API
Feature request / 功能建议
Current Implementation:
In the existing code, tokens and content (including reasoning steps) are combined into a single content field, with delta = chunk["choices"][0]["delta"] if "content" not in delta: continue else: response_content += html.escape(delta["content"]) yield response_content
This approach causes reasoning content (e.g.,
Add logic to extract reasoning_content and content from the API response independently.
Stop combining them into a single field with
reasoning_content = ""
content = ""
for chunk in response:
delta = chunk["choices"][0]["delta"]
if "reasoning_content" in delta:
reasoning_content += delta["reasoning_content"]
elif "content" in delta:
content += delta["content"]
## Yield or process the separated content as needed
Motivation / 动机
Consistency with Official API: Ensures compatibility with DeepSeek R1’s specifications, simplifying integration for third-party tools and frontend interfaces.
Improved Frontend Handling: Separating reasoning_content (e.g., internal reasoning steps) from content (final output) allows frontends to render these appropriately (e.g., hiding/showing reasoning steps or formatting them differently).
Avoid Workarounds: Eliminates the need for manual HTML escaping and parsing of
Your contribution / 您的贡献
https://api-docs.deepseek.com/zh-cn/guides/reasoning_model
If reasoning_content is added, some users may break due to this change.
Just wonder if the reasoning_content is de facto now.
After discussion online, we reach an agreement that we can enable reasoning_content via
xinference launch xxx --reasoning_content True
and it's disabled by default to not make too much impact.
If anyone is interested in implementing this feature, please let us know.
vllm has supported both the --enable-reasoning and --reasoning-parser options since version 0.7.1. If xinference supports them as well, does it also need to distinguish the reasoning-parser?
vllm has supported both the
--enable-reasoningand--reasoning-parseroptions since version 0.7.1. If xinference supports them as well, does it also need to distinguish the reasoning-parser?
I don't think we need to rely on vllm, the parser itself is quite simple.
I think we need to add reasoning_start_tag and reasoning_end_tag etc to llm_family.json, and just parse it according to the reasoning_content option if opened. This would be general for all engines.
请问reasoning_content字段什么时候可以发布啊。@qinxuye
vllm部署方式已支持返回reasoning_content。 @qinxuye 是不是考虑升级下vllm版本.
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
--enable-reasoning --reasoning-parser deepseek_r1
https://docs.vllm.ai/en/latest/features/reasoning_outputs.html
同问什么时候实现,不然客户端要做这个标签的提取,到时候xinference一更新又要把提取去掉,还挺麻烦的
同问什么时候实现,不然客户端要做这个标签的提取,到时候xinference一更新又要把提取去掉,还挺麻烦的
下周发版,这个特性默认不会开,需要 launch 的时候指定。
话说最新的版本貌似这个特性还是有点问题?
已经有了吗?怎么用
xinference launch xxx --reasoning_content True
这个也有了吗?
xinference launch xxx --reasoning_content True
这个也有了吗?
界面上的选项等同于这个命令。