inference
inference copied to clipboard
Qwen2.5 Omni部署有问题
System Info / 系統信息
直接用xinference启动 qwen2.5-omni-7b报错(同https://github.com/xorbitsai/inference/issues/3295)
把Qwen2_5OmniForConditionalGeneration替换成Qwen2_5OmniModel之后,能正常拉起模型 但是在输入任意内容之后会出现以下报错:
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [ ] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装
Version info / 版本信息
The command used to start Xinference / 用以启动 xinference 的命令
XINFERENCE_ENABLE_VIRTUAL_ENV=1 xinference-local --host 0.0.0.0 --port 9997
Reproduction / 复现过程
重新安装transformers: pip install git+https://github.com/huggingface/transformers@f742a644ca32e65758c3adb36225aef1731bd2a8 替换Qwen2_5OmniForConditionalGeneration成Qwen2_5OmniModel
Expected behavior / 期待表现
正常运行
服务端有报错吗?
服务端有报错吗?
有报错,已重新编辑问题
pip show uv
看下。
另外看下服务端日志,有没有完整的。
pip show uv看下。
另外看下服务端日志,有没有完整的。
2025-04-23 16:00:53,329 xinference.core.model 2446463 WARNING Currently for multimodal models, xinference only supports qwen-vl-chat, cogvlm2, glm-4v, MiniCPM-V-2.6 for batching. Your model qwen2.5-omni with model family None is disqualified.
WARNING:root:System prompt modified, audio output may not work as expected. Audio output mode only works when using default system prompt 'You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.'
2025-04-23 16:00:53,334 transformers.processing_utils 2446463 WARNING Keyword argument audio is not a valid argument for this processor and will be ignored.
Keyword argument audio is not a valid argument for this processor and will be ignored.
2025-04-23 16:00:53,350 xinference.core.model 2446463 ERROR [request 142da246-2019-11f0-b165-60cf848069ad] Leave chat, error: The following model_kwargs are not used by the model: ['speaker'] (note: typos in the generate arguments will also show up in this list), elapsed time: 0 s
Traceback (most recent call last):
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/utils.py", line 93, in wrapped
ret = await func(*args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 845, in chat
response = await self._call_wrapper_json(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 662, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 141, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 674, in _call_wrapper
ret = await asyncio.to_thread(fn, *args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 541, in _wrapper
result = fn(self, *args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/qwen-omni.py", line 119, in chat
c = self._generate(messages, generate_config)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/qwen-omni.py", line 170, in _generate
generated_ids, audio = self._model.generate(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py", line 4805, in generate
thinker_result = self.thinker.generate(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2075, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1436, in _validate_model_kwargs
raise ValueError(
ValueError: The following model_kwargs are not used by the model: ['speaker'] (note: typos in the generate arguments will also show up in this list)
2025-04-23 16:00:53,352 xinference.api.restful_api 2429748 ERROR [address=0.0.0.0:44087, pid=2446463] The following model_kwargs are not used by the model: ['speaker'] (note: typos in the generate arguments will also show up in this list)
Traceback (most recent call last):
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/api/restful_api.py", line 2128, in create_chat_completion
data = await model.chat(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/backends/context.py", line 262, in send
return self._process_result_message(result)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/backends/context.py", line 111, in _process_result_message
raise message.as_instanceof_cause()
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/backends/pool.py", line 689, in send
result = await self._run_coro(message.message_id, coro)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/backends/pool.py", line 389, in _run_coro
return await coro
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 564, in on_receive
raise ex
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
async with self._lock:
File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive
result = await result
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 106, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/utils.py", line 93, in wrapped
ret = await func(*args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 845, in chat
response = await self._call_wrapper_json(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 662, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 141, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 674, in _call_wrapper
ret = await asyncio.to_thread(fn, *args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 541, in _wrapper
result = fn(self, *args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/qwen-omni.py", line 119, in chat
c = self._generate(messages, generate_config)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/qwen-omni.py", line 170, in _generate
generated_ids, audio = self._model.generate(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py", line 4805, in generate
thinker_result = self.thinker.generate(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2075, in generate
self._validate_model_kwargs(model_kwargs.copy())
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1436, in _validate_model_kwargs
raise ValueError(
ValueError: [address=0.0.0.0:44087, pid=2446463] The following model_kwargs are not used by the model: ['speaker'] (note: typos in the generate arguments will also show up in this list)
Traceback (most recent call last):
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/queueing.py", line 631, in process_events
response = await route_utils.call_process_api(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/blocks.py", line 2145, in process_api
result = await self.call_function(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/blocks.py", line 1683, in call_function
prediction = await utils.async_iteration(iterator)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/utils.py", line 728, in async_iteration
return await anext(iterator)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/utils.py", line 722, in anext
return await anyio.to_thread.run_sync(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2471, in run_sync_in_worker_thread
return await future
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 968, in run
result = context.run(func, *args)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/utils.py", line 705, in run_sync_iterator_async
return next(iterator)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/utils.py", line 873, in gen_wrapper
response = next(iterator)
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/chat_interface.py", line 372, in predict
response = model.chat(
File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/client/restful/restful_client.py", line 580, in chat
raise RuntimeError(
RuntimeError: Failed to generate chat completion, detail: [address=0.0.0.0:44087, pid=2446463] The following model_kwargs are not used by the model: ['speaker'] (note: typos in the generate arguments will also show up in this list)
pip show uv看下。
另外看下服务端日志,有没有完整的。
发现这个版本低级失误。。。环境变量名字 typo 了。试下
XINFERENCE_EANBLE_VIRTUAL_ENV=1
下个版本会修复回来。
发现这个版本低级失误。。。环境变量名字 typo 了。试下
XINFERENCE_EANBLE_VIRTUAL_ENV=1
下个版本会修复回来。
貌似是UV的问题,一直安装不了各种依赖,然后需要我手动一个一个安装,请问一下有没有requirement.txt之类的可以让我一次性装好呢?我每次一个个手动安装要等他报错才知道哪个依赖没有
发现这个版本低级失误。。。环境变量名字 typo 了。试下
XINFERENCE_EANBLE_VIRTUAL_ENV=1
下个版本会修复回来。
经过漫长的pip install之后,还是无法运行。貌似需要麦克风输入才行?报错如下:
2025-04-23 20:13:09,158 xinference.core.model 3048147 WARNING Currently for multimodal models, xinference only supports qwen-vl-chat, cogvlm2, glm-4v, MiniCPM-V-2.6 for batching. Your model qwen2.5-omni with model family None is disqualified. 2025-04-23 20:13:26,560 xinference.core.model 3048147 ERROR [request 51d5d410-203c-11f0-8730-60cf848069ad] Leave chat, error: not enough values to unpack (expected 2, got 1), elapsed time: 17 s Traceback (most recent call last): File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 845, in chat response = await self._call_wrapper_json( File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 662, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 141, in _async_wrapper return await fn(self, *args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 674, in _call_wrapper ret = await asyncio.to_thread(fn, *args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 541, in _wrapper result = fn(self, *args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/qwen-omni.py", line 119, in chat c = self._generate(messages, generate_config) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/qwen-omni.py", line 170, in _generate generated_ids, audio = self._model.generate( ValueError: not enough values to unpack (expected 2, got 1) 2025-04-23 20:13:26,562 xinference.api.restful_api 3024602 ERROR [address=0.0.0.0:39451, pid=3048147] not enough values to unpack (expected 2, got 1) Traceback (most recent call last): File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/api/restful_api.py", line 2128, in create_chat_completion data = await model.chat( File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 106, in wrapped_func ret = await fn(self, *args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper r = await func(self, *args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 845, in chat response = await self._call_wrapper_json( File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 662, in _call_wrapper_json return await self._call_wrapper("json", fn, *args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 141, in _async_wrapper return await fn(self, *args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/model.py", line 674, in _call_wrapper ret = await asyncio.to_thread(fn, *args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/utils.py", line 541, in _wrapper result = fn(self, *args, **kwargs) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/qwen-omni.py", line 119, in chat c = self._generate(messages, generate_config) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/model/llm/transformers/qwen-omni.py", line 170, in _generate generated_ids, audio = self._model.generate( ValueError: [address=0.0.0.0:39451, pid=3048147] not enough values to unpack (expected 2, got 1) Traceback (most recent call last): File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/queueing.py", line 625, in process_events response = await route_utils.call_process_api( File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/route_utils.py", line 322, in call_process_api output = await app.get_blocks().process_api( File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/blocks.py", line 2137, in process_api result = await self.call_function( File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/blocks.py", line 1675, in call_function prediction = await utils.async_iteration(iterator) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/utils.py", line 735, in async_iteration return await anext(iterator) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/utils.py", line 729, in anext return await anyio.to_thread.run_sync( File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2471, in run_sync_in_worker_thread return await future File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 968, in run result = context.run(func, *args) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/utils.py", line 712, in run_sync_iterator_async return next(iterator) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/gradio/utils.py", line 873, in gen_wrapper response = next(iterator) File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/core/chat_interface.py", line 372, in predict response = model.chat( File "/home/zykj01/miniconda3/envs/zsx_env/lib/python3.10/site-packages/xinference/client/restful/restful_client.py", line 580, in chat raise RuntimeError( RuntimeError: Failed to generate chat completion, detail: [address=0.0.0.0:39451, pid=3048147] not enough values to unpack (expected 2, got 1)
发现这个版本低级失误。。。环境变量名字 typo 了。试下 XINFERENCE_EANBLE_VIRTUAL_ENV=1 下个版本会修复回来。
貌似是UV的问题,一直安装不了各种依赖,然后需要我手动一个一个安装,请问一下有没有requirement.txt之类的可以让我一次性装好呢?我每次一个个手动安装要等他报错才知道哪个依赖没有
你可以配置下pip config,设置下镜像,用清华或者阿里云镜像
发现这个版本低级失误。。。环境变量名字 typo 了。试下 XINFERENCE_EANBLE_VIRTUAL_ENV=1 下个版本会修复回来。
貌似是UV的问题,一直安装不了各种依赖,然后需要我手动一个一个安装,请问一下有没有requirement.txt之类的可以让我一次性装好呢?我每次一个个手动安装要等他报错才知道哪个依赖没有
你可以配置下pip config,设置下镜像,用清华或者阿里云镜像
用的1.5.0.post2镜像,同样跑不起来
报错再往上,介于 virtualenv 和报错之间的日志看下。
发现这个版本低级失误。。。环境变量名字 typo 了。试下 XINFERENCE_EANBLE_VIRTUAL_ENV=1 下个版本会修复回来。
貌似是UV的问题,一直安装不了各种依赖,然后需要我手动一个一个安装,请问一下有没有requirement.txt之类的可以让我一次性装好呢?我每次一个个手动安装要等他报错才知道哪个依赖没有
你可以配置下pip config,设置下镜像,用清华或者阿里云镜像
我已经装好环境了,也成功加载模型了,但是在推理的时候一直提示我meta tensors无法调用,是因为我没有装麦克风导致的吗?如果我想单纯的文本推理应该怎么做呢?我尝试使用内置的disable talker又会出其他的bug
具体代码在
.xinference/virtualenv/qwen2.5-omni/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py 大约4630行talker_result = self.talker.generate()这个方法
为什么是 meta tensor,你修改过 xinf 代码吗?
报错再往上,介于 virtualenv 和报错之间的日志看下。
XINFERENCE_ENABLE_VIRTUAL_ENV=1 xinference-local --host 0.0.0.0 --port 9998 --log-level debug
INFO 04-23 23:57:14 [init.py:239] Automatically detected platform cuda.
2025-04-23 23:57:16,562 xinference.core.supervisor 1337 INFO Xinference supervisor 0.0.0.0:62968 started
2025-04-23 23:57:16,687 xinference.core.worker 1337 INFO Starting metrics export server at 0.0.0.0:None
2025-04-23 23:57:16,691 xinference.core.worker 1337 INFO Checking metrics export server...
2025-04-23 23:57:19,045 xinference.core.worker 1337 INFO Metrics server is started at: http://0.0.0.0:35491
2025-04-23 23:57:19,046 xinference.core.worker 1337 INFO Purge cache directory: /xinference_home/cache
2025-04-23 23:57:19,048 xinference.core.supervisor 1337 DEBUG [request 5d1a4f8e-20d9-11f0-81bd-00001049fe80] Enter add_worker, args: <xinference.core.supervisor.SupervisorActor object at 0x7fc10efa16c0>,0.0.0.0:62968, kwargs:
2025-04-23 23:57:19,048 xinference.core.supervisor 1337 DEBUG Worker 0.0.0.0:62968 has been added successfully
2025-04-23 23:57:19,048 xinference.core.supervisor 1337 DEBUG [request 5d1a4f8e-20d9-11f0-81bd-00001049fe80] Leave add_worker, elapsed time: 0 s
2025-04-23 23:57:19,048 xinference.core.worker 1337 INFO Connected to supervisor as a fresh worker
2025-04-23 23:57:19,069 xinference.core.worker 1337 INFO Xinference worker 0.0.0.0:62968 started
2025-04-23 23:57:19,086 xinference.core.supervisor 1337 DEBUG Worker 0.0.0.0:62968 resources: {'cpu': ResourceStatus(usage=0.0, total=192, memory_used=71887577088, memory_available=2055306240000, memory_total=2164123627520), 'gpu-0': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=49832394752, mem_used=35688415232, mem_usage=0.417306796306968, gpu_util=0), 'gpu-1': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=8265859072, mem_used=77254950912, mem_usage=0.9033468102845793, gpu_util=0), 'gpu-2': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=13649575936, mem_used=71871234048, mem_usage=0.8403946836032811, gpu_util=0), 'gpu-3': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=9950527488, mem_used=75570282496, mem_usage=0.8836478806753393, gpu_util=0), 'gpu-4': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=14563934208, mem_used=70956875776, mem_usage=0.8297030370651921, gpu_util=0), 'gpu-5': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=13584564224, mem_used=71936245760, mem_usage=0.8411548694809893, gpu_util=0), 'gpu-6': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=85170192384, mem_used=350617600, mem_usage=0.004099792788043012, gpu_util=0), 'gpu-7': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=85170192384, mem_used=350617600, mem_usage=0.004099792788043012, gpu_util=0)}
2025-04-23 23:57:21,581 xinference.core.supervisor 1337 DEBUG Enter get_status, args: <xinference.core.supervisor.SupervisorActor object at 0x7fc10efa16c0>, kwargs:
2025-04-23 23:57:21,582 xinference.core.supervisor 1337 DEBUG Leave get_status, elapsed time: 0 s
2025-04-23 23:57:23,261 xinference.api.restful_api 1202 INFO Starting Xinference at endpoint: http://0.0.0.0:9998
2025-04-23 23:57:23,368 uvicorn.error 1202 INFO Uvicorn running on http://0.0.0.0:9998 (Press CTRL+C to quit)
2025-04-23 23:58:27,166 xinference.core.supervisor 1337 DEBUG Enter launch_builtin_model, model_uid: qwen2.5-omni, model_name: qwen2.5-omni, model_size: 7, model_format: pytorch, quantization: none, replica: 1, enable_xavier: False, kwargs: {}
2025-04-23 23:58:27,168 xinference.core.worker 1337 DEBUG Enter get_model_count, args: <xinference.core.worker.WorkerActor object at 0x7fc10ebd08b0>, kwargs:
2025-04-23 23:58:27,168 xinference.core.worker 1337 DEBUG Leave get_model_count, elapsed time: 0 s
2025-04-23 23:58:27,170 xinference.core.worker 1337 INFO [request 85b4d72a-20d9-11f0-81bd-00001049fe80] Enter launch_builtin_model, args: <xinference.core.worker.WorkerActor object at 0x7fc10ebd08b0>, kwargs: model_uid=qwen2.5-omni-0,model_name=qwen2.5-omni,model_size_in_billions=7,model_format=pytorch,quantization=none,model_engine=Transformers,model_type=LLM,n_gpu=auto,request_limits=None,peft_model_config=None,gpu_idx=[7],download_hub=None,model_path=None,xavier_config=None
2025-04-23 23:58:27,170 xinference.core.worker 1337 INFO You specify to launch the model: qwen2.5-omni on GPU index: [7] of the worker: 0.0.0.0:62968, xinference will automatically ignore the n_gpu option.
2025-04-23 23:58:28,099 xinference.model.llm.core 1337 DEBUG Launching qwen2.5-omni-0 with Qwen2_5OmniChatModel
2025-04-23 23:58:28,100 xinference.core.progress_tracker 1337 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 0.0
2025-04-23 23:58:28,101 xinference.model.llm.llm_family 1337 INFO Caching from Hugging Face: Qwen/Qwen2.5-Omni-7B
2025-04-23 23:58:28,102 xinference.model.llm.llm_family 1337 INFO Cache /xinference_home/cache/qwen2_5-omni-pytorch-7b exists
2025-04-23 23:58:28,104 xinference.core.progress_tracker 1337 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 0.8
2025-04-23 23:58:28,107 xinference.core.progress_tracker 1337 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 0.8
Using CPython 3.10.14 interpreter at: /usr/bin/python3
Creating virtual environment at: virtualenv/qwen2.5-omni
2025-04-23 23:58:28,412 xinference.core.worker 1337 INFO Installing packages ['git+https://github.com/huggingface/[email protected]', 'numpy==1.26.4', 'qwen_omni_utils', 'soundfile'] in virtual env /xinference_home/virtualenv/qwen2.5-omni, with settings(index_url=None)
Using Python 3.10.14 environment at: virtualenv/qwen2.5-omni
Updated https://github.com/huggingface/transformers (cb39f7dd5ba874ee1859b47283b08cd3a6ab5a0d)
Resolved 37 packages in 39.08s
Built transformers @ git+https://github.com/huggingface/transformers@cb39f7dd5ba874ee1859b47283b08cd3a6ab5a0d
Prepared 37 packages in 14.10s
Installed 37 packages in 112ms
- audioread==3.0.1
- av==14.3.0
- certifi==2025.1.31
- cffi==1.17.1
- charset-normalizer==3.4.1
- decorator==5.2.1
- filelock==3.18.0
- fsspec==2025.3.2
- huggingface-hub==0.30.2
- idna==3.10
- joblib==1.4.2
- lazy-loader==0.4
- librosa==0.11.0
- llvmlite==0.44.0
- msgpack==1.1.0
- numba==0.61.2
- numpy==1.26.4
- packaging==25.0
- pillow==11.2.1
- platformdirs==4.3.7
- pooch==1.8.2
- pycparser==2.22
- pyyaml==6.0.2
- qwen-omni-utils==0.0.4
- regex==2024.11.6
- requests==2.32.3
- safetensors==0.5.3
- scikit-learn==1.6.1
- scipy==1.15.2
- soundfile==0.13.1
- soxr==0.5.0.post1
- threadpoolctl==3.6.0
- tokenizers==0.21.1
- tqdm==4.67.1
- transformers==4.52.0.dev0 (from git+https://github.com/huggingface/transformers@cb39f7dd5ba874ee1859b47283b08cd3a6ab5a0d)
- typing-extensions==4.13.2
- urllib3==2.4.0 2025-04-23 23:59:23,998 transformers.utils.import_utils 1354 DEBUG Detected accelerate version: 0.34.0 Detected accelerate version: 0.34.0 2025-04-23 23:59:24,000 transformers.utils.import_utils 1354 DEBUG Detected bitsandbytes version: 0.45.5 Detected bitsandbytes version: 0.45.5 2025-04-23 23:59:24,001 transformers.utils.import_utils 1354 DEBUG Detected coloredlogs version: 15.0.1 Detected coloredlogs version: 15.0.1 2025-04-23 23:59:24,002 transformers.utils.import_utils 1354 DEBUG Detected datasets version: 2.21.0 Detected datasets version: 2.21.0 2025-04-23 23:59:24,003 transformers.utils.import_utils 1354 DEBUG Detected g2p_en version: 2.1.0 Detected g2p_en version: 2.1.0 2025-04-23 23:59:24,004 transformers.utils.import_utils 1354 DEBUG Detected jieba version: 0.42.1 Detected jieba version: 0.42.1 2025-04-23 23:59:24,004 transformers.utils.import_utils 1354 DEBUG Detected jinja2 version: 3.1.6 Detected jinja2 version: 3.1.6 2025-04-23 23:59:24,005 transformers.utils.import_utils 1354 DEBUG Detected librosa version: 0.11.0 Detected librosa version: 0.11.0 2025-04-23 23:59:24,006 transformers.utils.import_utils 1354 DEBUG Detected nltk version: 3.9.1 Detected nltk version: 3.9.1 2025-04-23 23:59:24,007 transformers.utils.import_utils 1354 DEBUG Detected openai version: 1.75.0 Detected openai version: 1.75.0 2025-04-23 23:59:24,008 transformers.utils.import_utils 1354 DEBUG Detected optimum version: 1.24.0 Detected optimum version: 1.24.0 2025-04-23 23:59:24,009 transformers.utils.import_utils 1354 DEBUG Detected pandas version: 2.2.2 Detected pandas version: 2.2.2 2025-04-23 23:59:24,010 transformers.utils.import_utils 1354 DEBUG Detected peft version: 0.15.2 Detected peft version: 0.15.2 2025-04-23 23:59:24,010 transformers.utils.import_utils 1354 DEBUG Detected phonemizer version: N/A Detected phonemizer version: N/A 2025-04-23 23:59:24,011 transformers.utils.import_utils 1354 DEBUG Detected psutil version: 7.0.0 Detected psutil version: 7.0.0 2025-04-23 23:59:24,012 transformers.utils.import_utils 1354 DEBUG Detected pygments version: 2.19.1 Detected pygments version: 2.19.1 2025-04-23 23:59:24,013 transformers.utils.import_utils 1354 DEBUG Detected sacremoses version: 0.1.1 Detected sacremoses version: 0.1.1 2025-04-23 23:59:24,013 transformers.utils.import_utils 1354 DEBUG Detected safetensors version: 0.4.4 Detected safetensors version: 0.4.4 2025-04-23 23:59:24,016 transformers.utils.import_utils 1354 DEBUG Detected scipy version: 1.15.2 Detected scipy version: 1.15.2 2025-04-23 23:59:24,017 transformers.utils.import_utils 1354 DEBUG Detected sentencepiece version: 0.2.0 Detected sentencepiece version: 0.2.0 2025-04-23 23:59:24,018 transformers.utils.import_utils 1354 DEBUG Detected gguf version: 0.16.2 Detected gguf version: 0.16.2 2025-04-23 23:59:24,019 transformers.utils.import_utils 1354 DEBUG Detected soundfile version: 0.13.1 Detected soundfile version: 0.13.1 2025-04-23 23:59:24,020 transformers.utils.import_utils 1354 DEBUG Detected spacy version: 3.8.5 Detected spacy version: 3.8.5 2025-04-23 23:59:24,022 transformers.utils.import_utils 1354 DEBUG Detected timm version: 1.0.15 Detected timm version: 1.0.15 2025-04-23 23:59:24,022 transformers.utils.import_utils 1354 DEBUG Detected tokenizers version: 0.21.1 Detected tokenizers version: 0.21.1 2025-04-23 23:59:24,023 transformers.utils.import_utils 1354 DEBUG Detected torchaudio version: 2.6.0 Detected torchaudio version: 2.6.0 2025-04-23 23:59:24,023 transformers.utils.import_utils 1354 DEBUG Detected torchvision version: 0.21.0 Detected torchvision version: 0.21.0 2025-04-23 23:59:24,024 transformers.utils.import_utils 1354 DEBUG Detected num2words version: 0.5.14 Detected num2words version: 0.5.14 2025-04-23 23:59:24,025 transformers.utils.import_utils 1354 DEBUG Detected tiktoken version: 0.7.0 Detected tiktoken version: 0.7.0 2025-04-23 23:59:24,025 transformers.utils.import_utils 1354 DEBUG Detected triton version: 3.2.0 Detected triton version: 3.2.0 2025-04-23 23:59:24,026 transformers.utils.import_utils 1354 DEBUG Detected rich version: 13.9.4 Detected rich version: 13.9.4 2025-04-23 23:59:24,027 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-23 23:59:24,030 transformers.utils.import_utils 1354 DEBUG Detected PIL version 10.4.0 Detected PIL version 10.4.0 2025-04-23 23:59:24,053 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-23 23:59:24,055 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-23 23:59:24,056 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-23 23:59:24,058 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-23 23:59:24,059 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-23 23:59:24,061 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-23 23:59:24,062 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-23 23:59:24,064 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-23 23:59:24,065 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-23 23:59:24,970 transformers.utils.import_utils 1354 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 INFO 04-23 23:59:26 [init.py:239] Automatically detected platform cuda. 2025-04-23 23:59:27,931 xinference.core.model 1354 DEBUG Starting ModelActor at 0.0.0.0:45471, uid: b'qwen2.5-omni-0' 2025-04-23 23:59:27,931 xinference.core.model 1354 WARNING Currently for multimodal models, xinference only supports qwen-vl-chat, cogvlm2, glm-4v, MiniCPM-V-2.6 for batching. Your model qwen2.5-omni with model family None is disqualified. 2025-04-23 23:59:27,931 xinference.core.model 1354 INFO Start requests handler. 2025-04-23 23:59:27,940 xinference.core.worker 1337 ERROR Failed to load model qwen2.5-omni-0 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1135, in launch_builtin_model await model_ref.load() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 471, in load await asyncio.to_thread(self._model.load) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen-omni.py", line 70, in load from transformers import ( ImportError: [address=0.0.0.0:45471, pid=1354] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py) 2025-04-23 23:59:27,945 xinference.core.progress_tracker 1337 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 1.0 2025-04-23 23:59:27,990 xinference.core.worker 1337 ERROR [request 85b4d72a-20d9-11f0-81bd-00001049fe80] Leave launch_builtin_model, error: [address=0.0.0.0:45471, pid=1354] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py), elapsed time: 60 s Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1135, in launch_builtin_model await model_ref.load() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 471, in load await asyncio.to_thread(self._model.load) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen-omni.py", line 70, in load from transformers import ( ImportError: [address=0.0.0.0:45471, pid=1354] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py) 2025-04-23 23:59:27,990 xinference.core.supervisor 1337 DEBUG [request a9f55e98-20d9-11f0-81bd-00001049fe80] Enter terminate_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7fc10efa16c0>,qwen2.5-omni, kwargs: suppress_exception=True 2025-04-23 23:59:27,990 xinference.core.supervisor 1337 DEBUG [request a9f55e98-20d9-11f0-81bd-00001049fe80] Leave terminate_model, elapsed time: 0 s 2025-04-23 23:59:27,994 xinference.api.restful_api 1202 ERROR [address=0.0.0.0:45471, pid=1354] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1022, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1199, in launch_builtin_model await _launch_model() File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1134, in _launch_model subpool_address = await _launch_one_model( File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1088, in _launch_one_model subpool_address = await worker_ref.launch_builtin_model( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1135, in launch_builtin_model await model_ref.load() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 471, in load await asyncio.to_thread(self._model.load) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen-omni.py", line 70, in load from transformers import ( ImportError: [address=0.0.0.0:45471, pid=1354] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py) 2025-04-24 00:05:16,571 xinference.core.progress_tracker 1337 DEBUG Remove requests ['launching-qwen2.5-omni-0'] due to it's finished for over 300.0 seconds
这是全部的日志了
Xoscar 的版本看下:
pip show xoscar
pip show xoscar
@liuxingbo12138 你用的是源码了吗?
发现这个版本低级失误。。。环境变量名字 typo 了。试下 XINFERENCE_EANBLE_VIRTUAL_ENV=1 下个版本会修复回来。
我看你写的 XINFERENCE_ENABLE_VIRTUAL_ENV=1 有安装 virtualenv 的环境。
@liuxingbo12138 你用的是源码了吗?
发现这个版本低级失误。。。环境变量名字 typo 了。试下 XINFERENCE_EANBLE_VIRTUAL_ENV=1 下个版本会修复回来。
我看你写的 XINFERENCE_ENABLE_VIRTUAL_ENV=1 有安装 virtualenv 的环境。
我docker pull的1.5.0.post2的镜像,直接XINFERENCE_ENABLE_VIRTUAL_ENV=1 xinference-local --host 0.0.0.0 --port 9998 --log-level debug启动的 下面图片是环境变量和启动命令
为什么是 meta tensor,你修改过 xinf 代码吗?
之前debug的时候在虚拟环境中修改过代码(/home/code/.xinference/virtualenv/qwen2.5-omni/lib/python3.10/site-packages/transformers/....) 我尝试着删掉这个文件夹,但是一重新启动之后修改过的代码又全部回来了。请问要怎么才能重新拉代码呢?
为什么是 meta tensor,你修改过 xinf 代码吗?
之前debug的时候在虚拟环境中修改过代码(/home/code/.xinference/virtualenv/qwen2.5-omni/lib/python3.10/site-packages/transformers/....) 我尝试着删掉这个文件夹,但是一重新启动之后修改过的代码又全部回来了。请问要怎么才能重新拉代码呢?
不需要修改。
@liuxingbo12138 你用的是源码了吗?
发现这个版本低级失误。。。环境变量名字 typo 了。试下 XINFERENCE_EANBLE_VIRTUAL_ENV=1 下个版本会修复回来。
我看你写的 XINFERENCE_ENABLE_VIRTUAL_ENV=1 有安装 virtualenv 的环境。
我docker pull的1.5.0.post2的镜像,直接XINFERENCE_ENABLE_VIRTUAL_ENV=1 xinference-local --host 0.0.0.0 --port 9998 --log-level debug启动的 下面图片是环境变量和启动命令
OK,你两个环境变量都设置了。所以能生效。我感觉你这个环境没有问题,不知道为什么 transformers 还是找到的老的,而不是虚拟环境里的。
@liuxingbo12138
试下改下代码,修改 xinference/model/llm/transformers/qwen-omni.py 中的load 函数,增加两行。
def load(self):
import transformers
logger.info("Transformers version: %s, location %s", transformers.__version__, transformers.__file__)
然后看下日志。
为什么是 meta tensor,你修改过 xinf 代码吗?
之前debug的时候在虚拟环境中修改过代码(/home/code/.xinference/virtualenv/qwen2.5-omni/lib/python3.10/site-packages/transformers/....) 我尝试着删掉这个文件夹,但是一重新启动之后修改过的代码又全部回来了。请问要怎么才能重新拉代码呢?
不需要修改。
是的我知道不需要修改,问题是现在怎么回滚到最原始的虚拟环境的代码呢?我尝试过删除虚拟环境但是没用
为什么是 meta tensor,你修改过 xinf 代码吗?
之前debug的时候在虚拟环境中修改过代码(/home/code/.xinference/virtualenv/qwen2.5-omni/lib/python3.10/site-packages/transformers/....) 我尝试着删掉这个文件夹,但是一重新启动之后修改过的代码又全部回来了。请问要怎么才能重新拉代码呢?
不需要修改。
是的我知道不需要修改,问题是现在怎么回滚到最原始的虚拟环境的代码呢?我尝试过删除虚拟环境但是没用
重建镜像好了
import transformers logger.info("Transformers version: %s, location %s", transformers.version, transformers.file)
修改的/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen-omni.py这个路径下的文件
2025-04-24 18:15:23,396 xinference.model.llm.transformers.qwen-omni 2239 INFO Transformers version: 4.50.3, location /usr/local/lib/python3.10/dist-packages/transformers/init.py 确实不知道为什么没有用虚拟环境里面的,下面是完整的日志
INFO 04-24 18:14:20 [init.py:239] Automatically detected platform cuda.
2025-04-24 18:14:22,467 xinference.core.supervisor 2222 INFO Xinference supervisor 0.0.0.0:41338 started
2025-04-24 18:14:22,594 xinference.core.worker 2222 INFO Starting metrics export server at 0.0.0.0:None
2025-04-24 18:14:22,597 xinference.core.worker 2222 INFO Checking metrics export server...
2025-04-24 18:14:24,837 xinference.core.worker 2222 INFO Metrics server is started at: http://0.0.0.0:44153
2025-04-24 18:14:24,838 xinference.core.worker 2222 INFO Purge cache directory: /xinference_home/cache
2025-04-24 18:14:24,840 xinference.core.supervisor 2222 DEBUG [request a05503cc-2172-11f0-a845-00001049fe80] Enter add_worker, args: <xinference.core.supervisor.SupervisorActor object at 0x7fac33389e90>,0.0.0.0:41338, kwargs:
2025-04-24 18:14:24,840 xinference.core.supervisor 2222 DEBUG Worker 0.0.0.0:41338 has been added successfully
2025-04-24 18:14:24,840 xinference.core.supervisor 2222 DEBUG [request a05503cc-2172-11f0-a845-00001049fe80] Leave add_worker, elapsed time: 0 s
2025-04-24 18:14:24,840 xinference.core.worker 2222 INFO Connected to supervisor as a fresh worker
2025-04-24 18:14:24,860 xinference.core.worker 2222 INFO Xinference worker 0.0.0.0:41338 started
2025-04-24 18:14:24,885 xinference.core.supervisor 2222 DEBUG Worker 0.0.0.0:41338 resources: {'cpu': ResourceStatus(usage=0.125, total=192, memory_used=72507211776, memory_available=2054686588928, memory_total=2164123627520), 'gpu-0': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=49832394752, mem_used=35688415232, mem_usage=0.417306796306968, gpu_util=0), 'gpu-1': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=8265859072, mem_used=77254950912, mem_usage=0.9033468102845793, gpu_util=0), 'gpu-2': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=13649575936, mem_used=71871234048, mem_usage=0.8403946836032811, gpu_util=0), 'gpu-3': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=9950527488, mem_used=75570282496, mem_usage=0.8836478806753393, gpu_util=0), 'gpu-4': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=14563934208, mem_used=70956875776, mem_usage=0.8297030370651921, gpu_util=0), 'gpu-5': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=13584564224, mem_used=71936245760, mem_usage=0.8411548694809893, gpu_util=0), 'gpu-6': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=85170192384, mem_used=350617600, mem_usage=0.004099792788043012, gpu_util=0), 'gpu-7': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=85170192384, mem_used=350617600, mem_usage=0.004099792788043012, gpu_util=0)}
2025-04-24 18:14:27,458 xinference.core.supervisor 2222 DEBUG Enter get_status, args: <xinference.core.supervisor.SupervisorActor object at 0x7fac33389e90>, kwargs:
2025-04-24 18:14:27,458 xinference.core.supervisor 2222 DEBUG Leave get_status, elapsed time: 0 s
2025-04-24 18:14:29,147 xinference.api.restful_api 2087 INFO Starting Xinference at endpoint: http://0.0.0.0:9998
2025-04-24 18:14:29,256 uvicorn.error 2087 INFO Uvicorn running on http://0.0.0.0:9998 (Press CTRL+C to quit)
2025-04-24 18:15:03,290 xinference.core.supervisor 2222 DEBUG Enter launch_builtin_model, model_uid: qwen2.5-omni, model_name: qwen2.5-omni, model_size: 7, model_format: pytorch, quantization: none, replica: 1, enable_xavier: False, kwargs: {}
2025-04-24 18:15:03,292 xinference.core.worker 2222 DEBUG Enter get_model_count, args: <xinference.core.worker.WorkerActor object at 0x7fac333c03b0>, kwargs:
2025-04-24 18:15:03,292 xinference.core.worker 2222 DEBUG Leave get_model_count, elapsed time: 0 s
2025-04-24 18:15:03,294 xinference.core.worker 2222 INFO [request b740a064-2172-11f0-a845-00001049fe80] Enter launch_builtin_model, args: <xinference.core.worker.WorkerActor object at 0x7fac333c03b0>, kwargs: model_uid=qwen2.5-omni-0,model_name=qwen2.5-omni,model_size_in_billions=7,model_format=pytorch,quantization=none,model_engine=Transformers,model_type=LLM,n_gpu=auto,request_limits=None,peft_model_config=None,gpu_idx=[7],download_hub=None,model_path=None,xavier_config=None
2025-04-24 18:15:03,295 xinference.core.worker 2222 INFO You specify to launch the model: qwen2.5-omni on GPU index: [7] of the worker: 0.0.0.0:41338, xinference will automatically ignore the n_gpu option.
2025-04-24 18:15:04,201 xinference.model.llm.core 2222 DEBUG Launching qwen2.5-omni-0 with Qwen2_5OmniChatModel
2025-04-24 18:15:04,202 xinference.model.llm.llm_family 2222 INFO Caching from Hugging Face: Qwen/Qwen2.5-Omni-7B
2025-04-24 18:15:04,203 xinference.core.progress_tracker 2222 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 0.0
2025-04-24 18:15:04,203 xinference.model.llm.llm_family 2222 INFO Cache /xinference_home/cache/qwen2_5-omni-pytorch-7b exists
2025-04-24 18:15:04,206 xinference.core.progress_tracker 2222 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 0.8
2025-04-24 18:15:04,208 xinference.core.progress_tracker 2222 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 0.8
Using CPython 3.10.14 interpreter at: /usr/bin/python3
Creating virtual environment at: virtualenv/qwen2.5-omni
2025-04-24 18:15:04,596 xinference.core.worker 2222 INFO Installing packages ['git+https://github.com/huggingface/[email protected]', 'numpy==1.26.4', 'qwen_omni_utils', 'soundfile'] in virtual env /xinference_home/virtualenv/qwen2.5-omni, with settings(index_url=None)
Using Python 3.10.14 environment at: virtualenv/qwen2.5-omni
Updated https://github.com/huggingface/transformers (43bb4c0456ebab67ca6b11fa5fa4c099fb2e6a2c)
Resolved 37 packages in 8.42s
Built transformers @ git+https://github.com/huggingface/transformers@43bb4c0456ebab67ca6b11fa5fa4c099fb2e6a2c
Prepared 1 package in 4.18s
Installed 37 packages in 100ms
- audioread==3.0.1
- av==14.3.0
- certifi==2025.1.31
- cffi==1.17.1
- charset-normalizer==3.4.1
- decorator==5.2.1
- filelock==3.18.0
- fsspec==2025.3.2
- huggingface-hub==0.30.2
- idna==3.10
- joblib==1.4.2
- lazy-loader==0.4
- librosa==0.11.0
- llvmlite==0.44.0
- msgpack==1.1.0
- numba==0.61.2
- numpy==1.26.4
- packaging==25.0
- pillow==11.2.1
- platformdirs==4.3.7
- pooch==1.8.2
- pycparser==2.22
- pyyaml==6.0.2
- qwen-omni-utils==0.0.4
- regex==2024.11.6
- requests==2.32.3
- safetensors==0.5.3
- scikit-learn==1.6.1
- scipy==1.15.2
- soundfile==0.13.1
- soxr==0.5.0.post1
- threadpoolctl==3.6.0
- tokenizers==0.21.1
- tqdm==4.67.1
- transformers==4.52.0.dev0 (from git+https://github.com/huggingface/transformers@43bb4c0456ebab67ca6b11fa5fa4c099fb2e6a2c)
- typing-extensions==4.13.2
- urllib3==2.4.0 2025-04-24 18:15:19,474 transformers.utils.import_utils 2239 DEBUG Detected accelerate version: 0.34.0 Detected accelerate version: 0.34.0 2025-04-24 18:15:19,476 transformers.utils.import_utils 2239 DEBUG Detected bitsandbytes version: 0.45.5 Detected bitsandbytes version: 0.45.5 2025-04-24 18:15:19,477 transformers.utils.import_utils 2239 DEBUG Detected coloredlogs version: 15.0.1 Detected coloredlogs version: 15.0.1 2025-04-24 18:15:19,478 transformers.utils.import_utils 2239 DEBUG Detected datasets version: 2.21.0 Detected datasets version: 2.21.0 2025-04-24 18:15:19,479 transformers.utils.import_utils 2239 DEBUG Detected g2p_en version: 2.1.0 Detected g2p_en version: 2.1.0 2025-04-24 18:15:19,480 transformers.utils.import_utils 2239 DEBUG Detected jieba version: 0.42.1 Detected jieba version: 0.42.1 2025-04-24 18:15:19,480 transformers.utils.import_utils 2239 DEBUG Detected jinja2 version: 3.1.6 Detected jinja2 version: 3.1.6 2025-04-24 18:15:19,481 transformers.utils.import_utils 2239 DEBUG Detected librosa version: 0.11.0 Detected librosa version: 0.11.0 2025-04-24 18:15:19,482 transformers.utils.import_utils 2239 DEBUG Detected nltk version: 3.9.1 Detected nltk version: 3.9.1 2025-04-24 18:15:19,483 transformers.utils.import_utils 2239 DEBUG Detected openai version: 1.75.0 Detected openai version: 1.75.0 2025-04-24 18:15:19,484 transformers.utils.import_utils 2239 DEBUG Detected optimum version: 1.24.0 Detected optimum version: 1.24.0 2025-04-24 18:15:19,485 transformers.utils.import_utils 2239 DEBUG Detected pandas version: 2.2.2 Detected pandas version: 2.2.2 2025-04-24 18:15:19,486 transformers.utils.import_utils 2239 DEBUG Detected peft version: 0.15.2 Detected peft version: 0.15.2 2025-04-24 18:15:19,486 transformers.utils.import_utils 2239 DEBUG Detected phonemizer version: N/A Detected phonemizer version: N/A 2025-04-24 18:15:19,487 transformers.utils.import_utils 2239 DEBUG Detected psutil version: 7.0.0 Detected psutil version: 7.0.0 2025-04-24 18:15:19,488 transformers.utils.import_utils 2239 DEBUG Detected pygments version: 2.19.1 Detected pygments version: 2.19.1 2025-04-24 18:15:19,489 transformers.utils.import_utils 2239 DEBUG Detected sacremoses version: 0.1.1 Detected sacremoses version: 0.1.1 2025-04-24 18:15:19,489 transformers.utils.import_utils 2239 DEBUG Detected safetensors version: 0.4.4 Detected safetensors version: 0.4.4 2025-04-24 18:15:19,492 transformers.utils.import_utils 2239 DEBUG Detected scipy version: 1.15.2 Detected scipy version: 1.15.2 2025-04-24 18:15:19,493 transformers.utils.import_utils 2239 DEBUG Detected sentencepiece version: 0.2.0 Detected sentencepiece version: 0.2.0 2025-04-24 18:15:19,493 transformers.utils.import_utils 2239 DEBUG Detected gguf version: 0.16.2 Detected gguf version: 0.16.2 2025-04-24 18:15:19,495 transformers.utils.import_utils 2239 DEBUG Detected soundfile version: 0.13.1 Detected soundfile version: 0.13.1 2025-04-24 18:15:19,496 transformers.utils.import_utils 2239 DEBUG Detected spacy version: 3.8.5 Detected spacy version: 3.8.5 2025-04-24 18:15:19,497 transformers.utils.import_utils 2239 DEBUG Detected timm version: 1.0.15 Detected timm version: 1.0.15 2025-04-24 18:15:19,498 transformers.utils.import_utils 2239 DEBUG Detected tokenizers version: 0.21.1 Detected tokenizers version: 0.21.1 2025-04-24 18:15:19,499 transformers.utils.import_utils 2239 DEBUG Detected torchaudio version: 2.6.0 Detected torchaudio version: 2.6.0 2025-04-24 18:15:19,499 transformers.utils.import_utils 2239 DEBUG Detected torchvision version: 0.21.0 Detected torchvision version: 0.21.0 2025-04-24 18:15:19,500 transformers.utils.import_utils 2239 DEBUG Detected num2words version: 0.5.14 Detected num2words version: 0.5.14 2025-04-24 18:15:19,501 transformers.utils.import_utils 2239 DEBUG Detected tiktoken version: 0.7.0 Detected tiktoken version: 0.7.0 2025-04-24 18:15:19,501 transformers.utils.import_utils 2239 DEBUG Detected triton version: 3.2.0 Detected triton version: 3.2.0 2025-04-24 18:15:19,502 transformers.utils.import_utils 2239 DEBUG Detected rich version: 13.9.4 Detected rich version: 13.9.4 2025-04-24 18:15:19,503 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 18:15:19,506 transformers.utils.import_utils 2239 DEBUG Detected PIL version 10.4.0 Detected PIL version 10.4.0 2025-04-24 18:15:19,529 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 18:15:19,530 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 18:15:19,532 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 18:15:19,533 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 18:15:19,535 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 18:15:19,536 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 18:15:19,538 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 18:15:19,539 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 18:15:19,541 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 18:15:20,420 transformers.utils.import_utils 2239 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 INFO 04-24 18:15:21 [init.py:239] Automatically detected platform cuda. 2025-04-24 18:15:23,390 xinference.core.model 2239 DEBUG Starting ModelActor at 0.0.0.0:42875, uid: b'qwen2.5-omni-0' 2025-04-24 18:15:23,390 xinference.core.model 2239 WARNING Currently for multimodal models, xinference only supports qwen-vl-chat, cogvlm2, glm-4v, MiniCPM-V-2.6 for batching. Your model qwen2.5-omni with model family None is disqualified. 2025-04-24 18:15:23,390 xinference.core.model 2239 INFO Start requests handler. 2025-04-24 18:15:23,396 xinference.model.llm.transformers.qwen-omni 2239 INFO Transformers version: 4.50.3, location /usr/local/lib/python3.10/dist-packages/transformers/init.py 2025-04-24 18:15:23,401 xinference.core.worker 2222 ERROR Failed to load model qwen2.5-omni-0 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1135, in launch_builtin_model await model_ref.load() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 471, in load await asyncio.to_thread(self._model.load) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen-omni.py", line 74, in load from transformers import ( ImportError: [address=0.0.0.0:42875, pid=2239] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py) 2025-04-24 18:15:23,405 xinference.core.progress_tracker 2222 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 1.0 2025-04-24 18:15:23,451 xinference.core.worker 2222 ERROR [request b740a064-2172-11f0-a845-00001049fe80] Leave launch_builtin_model, error: [address=0.0.0.0:42875, pid=2239] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py), elapsed time: 20 s Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1135, in launch_builtin_model await model_ref.load() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 471, in load await asyncio.to_thread(self._model.load) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen-omni.py", line 74, in load from transformers import ( ImportError: [address=0.0.0.0:42875, pid=2239] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py) 2025-04-24 18:15:23,452 xinference.core.supervisor 2222 DEBUG [request c3449708-2172-11f0-a845-00001049fe80] Enter terminate_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7fac33389e90>,qwen2.5-omni, kwargs: suppress_exception=True 2025-04-24 18:15:23,452 xinference.core.supervisor 2222 DEBUG [request c3449708-2172-11f0-a845-00001049fe80] Leave terminate_model, elapsed time: 0 s 2025-04-24 18:15:23,456 xinference.api.restful_api 2087 ERROR [address=0.0.0.0:42875, pid=2239] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1022, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1199, in launch_builtin_model await _launch_model() File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1134, in _launch_model subpool_address = await _launch_one_model( File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1088, in _launch_one_model subpool_address = await worker_ref.launch_builtin_model( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1135, in launch_builtin_model await model_ref.load() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 471, in load await asyncio.to_thread(self._model.load) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen-omni.py", line 74, in load from transformers import ( ImportError: [address=0.0.0.0:42875, pid=2239] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py)
@liuxingbo12138 能否修改下
https://github.com/xorbitsai/inference/blob/452dc9b21b1aaf818f2bee89119ee32647dd0cdb/xinference/core/worker.py#L614
改成 start_method = "spawn"
然后重启服务看下。
@liuxingbo12138 能否修改下
inference/xinference/core/worker.py
Line 614 in 452dc9b
start_method = "forkserver" 改成 start_method = "spawn"
然后重启服务看下。
还是不行,看起来还是没有用虚拟环境中的依赖
下面是日志
XINFERENCE_ENABLE_VIRTUAL_ENV=1 xinference-local --host 0.0.0.0 --port 9998 --log-level debug
INFO 04-24 22:23:17 [init.py:239] Automatically detected platform cuda.
2025-04-24 22:23:19,733 xinference.core.supervisor 3359 INFO Xinference supervisor 0.0.0.0:17751 started
2025-04-24 22:23:19,873 xinference.core.worker 3359 INFO Starting metrics export server at 0.0.0.0:None
2025-04-24 22:23:19,876 xinference.core.worker 3359 INFO Checking metrics export server...
2025-04-24 22:23:22,265 xinference.core.worker 3359 INFO Metrics server is started at: http://0.0.0.0:34923
2025-04-24 22:23:22,265 xinference.core.worker 3359 INFO Purge cache directory: /xinference_home/cache
2025-04-24 22:23:22,267 xinference.core.supervisor 3359 DEBUG [request 67bb9f12-2195-11f0-ae0b-00001049fe80] Enter add_worker, args: <xinference.core.supervisor.SupervisorActor object at 0x7fd76c39e660>,0.0.0.0:17751, kwargs:
2025-04-24 22:23:22,268 xinference.core.supervisor 3359 DEBUG Worker 0.0.0.0:17751 has been added successfully
2025-04-24 22:23:22,268 xinference.core.supervisor 3359 DEBUG [request 67bb9f12-2195-11f0-ae0b-00001049fe80] Leave add_worker, elapsed time: 0 s
2025-04-24 22:23:22,268 xinference.core.worker 3359 INFO Connected to supervisor as a fresh worker
2025-04-24 22:23:22,288 xinference.core.worker 3359 INFO Xinference worker 0.0.0.0:17751 started
2025-04-24 22:23:22,316 xinference.core.supervisor 3359 DEBUG Worker 0.0.0.0:17751 resources: {'cpu': ResourceStatus(usage=0.519, total=192, memory_used=73056165888, memory_available=2054137577472, memory_total=2164123627520), 'gpu-0': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=34338635776, mem_used=51182174208, mem_usage=0.5984762564523841, gpu_util=0), 'gpu-1': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=8265859072, mem_used=77254950912, mem_usage=0.9033468102845793, gpu_util=0), 'gpu-2': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=13649575936, mem_used=71871234048, mem_usage=0.8403946836032811, gpu_util=0), 'gpu-3': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=9950527488, mem_used=75570282496, mem_usage=0.8836478806753393, gpu_util=0), 'gpu-4': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=14563934208, mem_used=70956875776, mem_usage=0.8297030370651921, gpu_util=0), 'gpu-5': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=13584564224, mem_used=71936245760, mem_usage=0.8411548694809893, gpu_util=0), 'gpu-6': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=85170192384, mem_used=350617600, mem_usage=0.004099792788043012, gpu_util=0), 'gpu-7': GPUStatus(name='NVIDIA H100 80GB HBM3', mem_total=85520809984, mem_free=85170192384, mem_used=350617600, mem_usage=0.004099792788043012, gpu_util=0)}
2025-04-24 22:23:24,753 xinference.core.supervisor 3359 DEBUG Enter get_status, args: <xinference.core.supervisor.SupervisorActor object at 0x7fd76c39e660>, kwargs:
2025-04-24 22:23:24,753 xinference.core.supervisor 3359 DEBUG Leave get_status, elapsed time: 0 s
2025-04-24 22:23:26,598 xinference.api.restful_api 3224 INFO Starting Xinference at endpoint: http://0.0.0.0:9998
2025-04-24 22:23:26,707 uvicorn.error 3224 INFO Uvicorn running on http://0.0.0.0:9998 (Press CTRL+C to quit)
2025-04-24 22:23:38,689 xinference.core.supervisor 3359 DEBUG Enter launch_builtin_model, model_uid: qwen2.5-omni, model_name: qwen2.5-omni, model_size: 7, model_format: pytorch, quantization: none, replica: 1, enable_xavier: False, kwargs: {}
2025-04-24 22:23:38,691 xinference.core.worker 3359 DEBUG Enter get_model_count, args: <xinference.core.worker.WorkerActor object at 0x7fd76c3d0950>, kwargs:
2025-04-24 22:23:38,691 xinference.core.worker 3359 DEBUG Leave get_model_count, elapsed time: 0 s
2025-04-24 22:23:38,692 xinference.core.worker 3359 INFO [request 7185d4cc-2195-11f0-ae0b-00001049fe80] Enter launch_builtin_model, args: <xinference.core.worker.WorkerActor object at 0x7fd76c3d0950>, kwargs: model_uid=qwen2.5-omni-0,model_name=qwen2.5-omni,model_size_in_billions=7,model_format=pytorch,quantization=none,model_engine=Transformers,model_type=LLM,n_gpu=auto,request_limits=None,peft_model_config=None,gpu_idx=[7],download_hub=None,model_path=None,xavier_config=None
2025-04-24 22:23:38,693 xinference.core.worker 3359 INFO You specify to launch the model: qwen2.5-omni on GPU index: [7] of the worker: 0.0.0.0:17751, xinference will automatically ignore the
n_gpu option.
2025-04-24 22:23:39,575 xinference.model.llm.core 3359 DEBUG Launching qwen2.5-omni-0 with Qwen2_5OmniChatModel
2025-04-24 22:23:39,575 xinference.core.progress_tracker 3359 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 0.0
2025-04-24 22:23:39,576 xinference.model.llm.llm_family 3359 INFO Caching from Hugging Face: Qwen/Qwen2.5-Omni-7B
2025-04-24 22:23:39,576 xinference.model.llm.llm_family 3359 INFO Cache /xinference_home/cache/qwen2_5-omni-pytorch-7b exists
2025-04-24 22:23:39,577 xinference.core.progress_tracker 3359 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 0.8
2025-04-24 22:23:39,579 xinference.core.progress_tracker 3359 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 0.8
Using CPython 3.10.14 interpreter at: /usr/bin/python3
Creating virtual environment at: virtualenv/qwen2.5-omni
2025-04-24 22:23:39,886 xinference.core.worker 3359 INFO Installing packages ['git+https://github.com/huggingface/[email protected]', 'numpy==1.26.4', 'qwen_omni_utils', 'soundfile'] in virtual env /xinference_home/virtualenv/qwen2.5-omni, with settings(index_url=None)
Using Python 3.10.14 environment at: virtualenv/qwen2.5-omni
Resolved 37 packages in 2.43s
Installed 37 packages in 108ms
- audioread==3.0.1
- av==14.3.0
- certifi==2025.1.31
- cffi==1.17.1
- charset-normalizer==3.4.1
- decorator==5.2.1
- filelock==3.18.0
- fsspec==2025.3.2
- huggingface-hub==0.30.2
- idna==3.10
- joblib==1.4.2
- lazy-loader==0.4
- librosa==0.11.0
- llvmlite==0.44.0
- msgpack==1.1.0
- numba==0.61.2
- numpy==1.26.4
- packaging==25.0
- pillow==11.2.1
- platformdirs==4.3.7
- pooch==1.8.2
- pycparser==2.22
- pyyaml==6.0.2
- qwen-omni-utils==0.0.4
- regex==2024.11.6
- requests==2.32.3
- safetensors==0.5.3
- scikit-learn==1.6.1
- scipy==1.15.2
- soundfile==0.13.1
- soxr==0.5.0.post1
- threadpoolctl==3.6.0
- tokenizers==0.21.1
- tqdm==4.67.1
- transformers==4.52.0.dev0 (from git+https://github.com/huggingface/transformers@43bb4c0456ebab67ca6b11fa5fa4c099fb2e6a2c)
- typing-extensions==4.13.2
- urllib3==2.4.0 2025-04-24 22:23:44,635 transformers.utils.import_utils 3375 DEBUG Detected accelerate version: 0.34.0 Detected accelerate version: 0.34.0 2025-04-24 22:23:44,636 transformers.utils.import_utils 3375 DEBUG Detected bitsandbytes version: 0.45.5 Detected bitsandbytes version: 0.45.5 2025-04-24 22:23:44,637 transformers.utils.import_utils 3375 DEBUG Detected coloredlogs version: 15.0.1 Detected coloredlogs version: 15.0.1 2025-04-24 22:23:44,639 transformers.utils.import_utils 3375 DEBUG Detected datasets version: 2.21.0 Detected datasets version: 2.21.0 2025-04-24 22:23:44,640 transformers.utils.import_utils 3375 DEBUG Detected g2p_en version: 2.1.0 Detected g2p_en version: 2.1.0 2025-04-24 22:23:44,640 transformers.utils.import_utils 3375 DEBUG Detected jieba version: 0.42.1 Detected jieba version: 0.42.1 2025-04-24 22:23:44,641 transformers.utils.import_utils 3375 DEBUG Detected jinja2 version: 3.1.6 Detected jinja2 version: 3.1.6 2025-04-24 22:23:44,642 transformers.utils.import_utils 3375 DEBUG Detected librosa version: 0.11.0 Detected librosa version: 0.11.0 2025-04-24 22:23:44,642 transformers.utils.import_utils 3375 DEBUG Detected nltk version: 3.9.1 Detected nltk version: 3.9.1 2025-04-24 22:23:44,643 transformers.utils.import_utils 3375 DEBUG Detected openai version: 1.75.0 Detected openai version: 1.75.0 2025-04-24 22:23:44,644 transformers.utils.import_utils 3375 DEBUG Detected optimum version: 1.24.0 Detected optimum version: 1.24.0 2025-04-24 22:23:44,646 transformers.utils.import_utils 3375 DEBUG Detected pandas version: 2.2.2 Detected pandas version: 2.2.2 2025-04-24 22:23:44,646 transformers.utils.import_utils 3375 DEBUG Detected peft version: 0.15.2 Detected peft version: 0.15.2 2025-04-24 22:23:44,647 transformers.utils.import_utils 3375 DEBUG Detected phonemizer version: N/A Detected phonemizer version: N/A 2025-04-24 22:23:44,648 transformers.utils.import_utils 3375 DEBUG Detected psutil version: 7.0.0 Detected psutil version: 7.0.0 2025-04-24 22:23:44,648 transformers.utils.import_utils 3375 DEBUG Detected pygments version: 2.19.1 Detected pygments version: 2.19.1 2025-04-24 22:23:44,649 transformers.utils.import_utils 3375 DEBUG Detected sacremoses version: 0.1.1 Detected sacremoses version: 0.1.1 2025-04-24 22:23:44,650 transformers.utils.import_utils 3375 DEBUG Detected safetensors version: 0.4.4 Detected safetensors version: 0.4.4 2025-04-24 22:23:44,653 transformers.utils.import_utils 3375 DEBUG Detected scipy version: 1.15.2 Detected scipy version: 1.15.2 2025-04-24 22:23:44,653 transformers.utils.import_utils 3375 DEBUG Detected sentencepiece version: 0.2.0 Detected sentencepiece version: 0.2.0 2025-04-24 22:23:44,654 transformers.utils.import_utils 3375 DEBUG Detected gguf version: 0.16.2 Detected gguf version: 0.16.2 2025-04-24 22:23:44,656 transformers.utils.import_utils 3375 DEBUG Detected soundfile version: 0.13.1 Detected soundfile version: 0.13.1 2025-04-24 22:23:44,657 transformers.utils.import_utils 3375 DEBUG Detected spacy version: 3.8.5 Detected spacy version: 3.8.5 2025-04-24 22:23:44,658 transformers.utils.import_utils 3375 DEBUG Detected timm version: 1.0.15 Detected timm version: 1.0.15 2025-04-24 22:23:44,658 transformers.utils.import_utils 3375 DEBUG Detected tokenizers version: 0.21.1 Detected tokenizers version: 0.21.1 2025-04-24 22:23:44,659 transformers.utils.import_utils 3375 DEBUG Detected torchaudio version: 2.6.0 Detected torchaudio version: 2.6.0 2025-04-24 22:23:44,660 transformers.utils.import_utils 3375 DEBUG Detected torchvision version: 0.21.0 Detected torchvision version: 0.21.0 2025-04-24 22:23:44,660 transformers.utils.import_utils 3375 DEBUG Detected num2words version: 0.5.14 Detected num2words version: 0.5.14 2025-04-24 22:23:44,661 transformers.utils.import_utils 3375 DEBUG Detected tiktoken version: 0.7.0 Detected tiktoken version: 0.7.0 2025-04-24 22:23:44,662 transformers.utils.import_utils 3375 DEBUG Detected triton version: 3.2.0 Detected triton version: 3.2.0 2025-04-24 22:23:44,662 transformers.utils.import_utils 3375 DEBUG Detected rich version: 13.9.4 Detected rich version: 13.9.4 2025-04-24 22:23:44,663 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 22:23:44,666 transformers.utils.import_utils 3375 DEBUG Detected PIL version 10.4.0 Detected PIL version 10.4.0 2025-04-24 22:23:44,691 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 22:23:44,693 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 22:23:44,695 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 22:23:44,696 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 22:23:44,698 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 22:23:44,699 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 22:23:44,701 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 22:23:44,702 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 22:23:44,704 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 2025-04-24 22:23:45,633 transformers.utils.import_utils 3375 DEBUG Detected torch version: 2.6.0 Detected torch version: 2.6.0 INFO 04-24 22:23:46 [init.py:239] Automatically detected platform cuda. 2025-04-24 22:23:48,759 xinference.core.model 3375 DEBUG Starting ModelActor at 0.0.0.0:42591, uid: b'qwen2.5-omni-0' 2025-04-24 22:23:48,760 xinference.core.model 3375 WARNING Currently for multimodal models, xinference only supports qwen-vl-chat, cogvlm2, glm-4v, MiniCPM-V-2.6 for batching. Your model qwen2.5-omni with model family None is disqualified. 2025-04-24 22:23:48,760 xinference.core.model 3375 INFO Start requests handler. 2025-04-24 22:23:48,763 xinference.model.llm.transformers.qwen-omni 3375 INFO Transformers version: 4.50.3, location /usr/local/lib/python3.10/dist-packages/transformers/init.py 2025-04-24 22:23:48,765 xinference.core.worker 3359 ERROR Failed to load model qwen2.5-omni-0 Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1136, in launch_builtin_model await model_ref.load() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 471, in load await asyncio.to_thread(self._model.load) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen-omni.py", line 74, in load from transformers import ( ImportError: [address=0.0.0.0:42591, pid=3375] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py) 2025-04-24 22:23:48,767 xinference.core.progress_tracker 3359 DEBUG Setting progress, request id: launching-qwen2.5-omni-0, progress: 1.0 2025-04-24 22:23:48,810 xinference.core.worker 3359 ERROR [request 7185d4cc-2195-11f0-ae0b-00001049fe80] Leave launch_builtin_model, error: [address=0.0.0.0:42591, pid=3375] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py), elapsed time: 10 s Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1136, in launch_builtin_model await model_ref.load() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 471, in load await asyncio.to_thread(self._model.load) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen-omni.py", line 74, in load from transformers import ( ImportError: [address=0.0.0.0:42591, pid=3375] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py) 2025-04-24 22:23:48,811 xinference.core.supervisor 3359 DEBUG [request 778ddb62-2195-11f0-ae0b-00001049fe80] Enter terminate_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7fd76c39e660>,qwen2.5-omni, kwargs: suppress_exception=True 2025-04-24 22:23:48,811 xinference.core.supervisor 3359 DEBUG [request 778ddb62-2195-11f0-ae0b-00001049fe80] Leave terminate_model, elapsed time: 0 s 2025-04-24 22:23:48,815 xinference.api.restful_api 3224 ERROR [address=0.0.0.0:42591, pid=3375] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1022, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1199, in launch_builtin_model await _launch_model() File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1134, in _launch_model subpool_address = await _launch_one_model( File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1088, in _launch_one_model subpool_address = await worker_ref.launch_builtin_model( File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped ret = await func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1136, in launch_builtin_model await model_ref.load() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 262, in send return self._process_result_message(result) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message raise message.as_instanceof_cause() File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 689, in send result = await self._run_coro(message.message_id, coro) File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro return await coro File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 564, in on_receive raise ex File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive result = await result File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 471, in load await asyncio.to_thread(self._model.load) File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/transformers/qwen-omni.py", line 74, in load from transformers import ( ImportError: [address=0.0.0.0:42591, pid=3375] cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/init.py)
之前打印 import transformers 的地方可以加一个打印 print(sys.path) 么?
sys.path
加完了,秦总
有一个路径居然在我们的前面,可能是这个导致了问题。
我们定位下看如何解决。