inference xinf 的 v1.11.0.post1 启动qwen-next模型调用报错

System Info / 系統信息

ubuntu22.04 cuda 12.8

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[x] docker / docker
[ ] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

v1.11.0.post1

The command used to start Xinference / 用以启动 xinference 的命令

单卡启动（H20）

Reproduction / 复现过程

如上

报错情况如下：

2025-10-28 01:40:24,042 xinference.core.model 603 DEBUG    [request 0462ab00-b35c-11f0-97b2-0242acfa0004] Enter chat, args: ModelActor(qwen3-0),[{'role': 'user', 'content': '你知道地球和太阳哪个大吗'}],{'temperature': 0.1, 'max_tokens': None}, kwargs: raw_params={'temperature': 0.1}
2025-10-28 01:40:24,045 xinference.model.llm.utils 603 DEBUG    Prompt: <|im_start|>user
你知道地球和太阳哪个大吗<|im_end|>
<|im_start|>assistant
<think>

2025-10-28 01:40:26,556 xinference.model.llm.transformers.utils 603 DEBUG    No max_tokens set, setting to: 262127
2025-10-28 01:40:26,557 xinference.model.llm.transformers.utils 603 ERROR    Internal error for batch inference: 'NoneType' object has no attribute 'shape'.
Traceback (most recent call last):
  File "/opt/inference/xinference/model/llm/transformers/utils.py", line 482, in batch_inference_one_step
    _batch_inference_one_step_internal(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/inference/xinference/model/llm/transformers/utils.py", line 302, in _batch_inference_one_step_internal
    batch_size, seq_len = get_batch_size_and_seq_len_from_kv_cache(
  File "/opt/inference/xinference/model/llm/transformers/utils.py", line 201, in get_batch_size_and_seq_len_from_kv_cache
    return kv[0][0].shape[bs_idx], kv[0][0].shape[seq_len_idx] + 1
AttributeError: 'NoneType' object has no attribute 'shape'
2025-10-28 01:40:26,560 xinference.core.model 603 ERROR    [request 0462ab00-b35c-11f0-97b2-0242acfa0004] Leave chat, error: 'NoneType' object has no attribute 'shape', elapsed time: 2 s
Traceback (most recent call last):
  File "/opt/inference/xinference/core/utils.py", line 93, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 685, in chat
    response = await self._call_wrapper_json(
  File "/opt/inference/xinference/core/model.py", line 572, in _call_wrapper_json
    return await self._call_wrapper("json", fn, *args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 140, in _async_wrapper
    return await fn(self, *args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 582, in _call_wrapper
    ret = await fn(*args, **kwargs)
  File "/opt/inference/xinference/model/llm/transformers/core.py", line 1000, in chat
    return await fut
ValueError: 'NoneType' object has no attribute 'shape'
2025-10-28 01:40:26,560 xinference.core.model 603 DEBUG    After request chat, current serve request count: 0 for the model qwen3
2025-10-28 01:40:26,562 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:42879, pid=603] 'NoneType' object has no attribute 'shape'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 3066, in create_chat_completion
    data = await model.chat(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 418, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 564, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/opt/inference/xinference/core/model.py", line 105, in wrapped_func
    ret = await fn(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 496, in _wrapper
    r = await func(self, *args, **kwargs)
  File "/opt/inference/xinference/core/utils.py", line 93, in wrapped
    ret = await func(*args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 685, in chat
    response = await self._call_wrapper_json(
  File "/opt/inference/xinference/core/model.py", line 572, in _call_wrapper_json
    return await self._call_wrapper("json", fn, *args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 140, in _async_wrapper
    return await fn(self, *args, **kwargs)
  File "/opt/inference/xinference/core/model.py", line 582, in _call_wrapper
    ret = await fn(*args, **kwargs)
  File "/opt/inference/xinference/model/llm/transformers/core.py", line 1000, in chat
    return await fut
ValueError: [address=0.0.0.0:42879, pid=603] 'NoneType' object has no attribute 'shape'

Expected behavior / 期待表现

期望能正常使用。尤其是使用 fp8的

Oct 27 '25 09:10 Tian14267

@Jun-Howie 帮忙看下。

Oct 29 '25 09:10 qinxuye

@Tian14267 transformers后端暂时需要源码安装

pip install git+https://github.com/huggingface/transformers.git@main

Oct 30 '25 13:10 Jun-Howie

@Tian14267 transformers后端暂时需要源码安装

pip install git+https://github.com/huggingface/transformers.git@main

你好，我试着用transformers更新源码启动模型之后，跟模型对话回复乱码

，另外咨询下，qwen3-next-fp8用vllm启动报错有临时解决办法吗

Nov 03 '25 06:11 greenhand0011

This issue is stale because it has been open for 7 days with no activity.

Nov 10 '25 19:11 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

Nov 16 '25 19:11 github-actions[bot]