inference
inference copied to clipboard
xinf 的 v1.11.0.post1 启动qwen-next模型调用报错
System Info / 系統信息
ubuntu22.04 cuda 12.8
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [x] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装
Version info / 版本信息
v1.11.0.post1
The command used to start Xinference / 用以启动 xinference 的命令
单卡启动(H20)
Reproduction / 复现过程
如上
报错情况如下:
2025-10-28 01:40:24,042 xinference.core.model 603 DEBUG [request 0462ab00-b35c-11f0-97b2-0242acfa0004] Enter chat, args: ModelActor(qwen3-0),[{'role': 'user', 'content': '你知道地球和太阳哪个大吗'}],{'temperature': 0.1, 'max_tokens': None}, kwargs: raw_params={'temperature': 0.1}
2025-10-28 01:40:24,045 xinference.model.llm.utils 603 DEBUG Prompt: <|im_start|>user
你知道地球和太阳哪个大吗<|im_end|>
<|im_start|>assistant
<think>
2025-10-28 01:40:26,556 xinference.model.llm.transformers.utils 603 DEBUG No max_tokens set, setting to: 262127
2025-10-28 01:40:26,557 xinference.model.llm.transformers.utils 603 ERROR Internal error for batch inference: 'NoneType' object has no attribute 'shape'.
Traceback (most recent call last):
File "/opt/inference/xinference/model/llm/transformers/utils.py", line 482, in batch_inference_one_step
_batch_inference_one_step_internal(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/inference/xinference/model/llm/transformers/utils.py", line 302, in _batch_inference_one_step_internal
batch_size, seq_len = get_batch_size_and_seq_len_from_kv_cache(
File "/opt/inference/xinference/model/llm/transformers/utils.py", line 201, in get_batch_size_and_seq_len_from_kv_cache
return kv[0][0].shape[bs_idx], kv[0][0].shape[seq_len_idx] + 1
AttributeError: 'NoneType' object has no attribute 'shape'
2025-10-28 01:40:26,560 xinference.core.model 603 ERROR [request 0462ab00-b35c-11f0-97b2-0242acfa0004] Leave chat, error: 'NoneType' object has no attribute 'shape', elapsed time: 2 s
Traceback (most recent call last):
File "/opt/inference/xinference/core/utils.py", line 93, in wrapped
ret = await func(*args, **kwargs)
File "/opt/inference/xinference/core/model.py", line 685, in chat
response = await self._call_wrapper_json(
File "/opt/inference/xinference/core/model.py", line 572, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/opt/inference/xinference/core/model.py", line 140, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/opt/inference/xinference/core/model.py", line 582, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/opt/inference/xinference/model/llm/transformers/core.py", line 1000, in chat
return await fut
ValueError: 'NoneType' object has no attribute 'shape'
2025-10-28 01:40:26,560 xinference.core.model 603 DEBUG After request chat, current serve request count: 0 for the model qwen3
2025-10-28 01:40:26,562 xinference.api.restful_api 1 ERROR [address=0.0.0.0:42879, pid=603] 'NoneType' object has no attribute 'shape'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 3066, in create_chat_completion
data = await model.chat(
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send
return self._process_result_message(result)
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
raise message.as_instanceof_cause()
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send
result = await self._run_coro(message.message_id, coro)
File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro
return await coro
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 418, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 564, in __on_receive__
raise ex
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__
result = await result
File "/opt/inference/xinference/core/model.py", line 105, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 496, in _wrapper
r = await func(self, *args, **kwargs)
File "/opt/inference/xinference/core/utils.py", line 93, in wrapped
ret = await func(*args, **kwargs)
File "/opt/inference/xinference/core/model.py", line 685, in chat
response = await self._call_wrapper_json(
File "/opt/inference/xinference/core/model.py", line 572, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/opt/inference/xinference/core/model.py", line 140, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/opt/inference/xinference/core/model.py", line 582, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/opt/inference/xinference/model/llm/transformers/core.py", line 1000, in chat
return await fut
ValueError: [address=0.0.0.0:42879, pid=603] 'NoneType' object has no attribute 'shape'
Expected behavior / 期待表现
期望能正常使用。尤其是使用 fp8的
@Jun-Howie 帮忙看下。
@Tian14267 transformers后端暂时需要源码安装
pip install git+https://github.com/huggingface/transformers.git@main
@Tian14267 transformers后端暂时需要源码安装
pip install git+https://github.com/huggingface/transformers.git@main
你好,我试着用transformers更新源码启动模型之后,跟模型对话回复乱码
,另外咨询下,qwen3-next-fp8用vllm启动报错有临时解决办法吗
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.