inference icon indicating copy to clipboard operation
inference copied to clipboard

[BUG] The fetch request of the rerank model cannot be handled correctly

Open wongdi opened this issue 9 months ago • 6 comments

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version.
  2. The version of xinference you use.
  3. Versions of crucial packages.
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.


Python 3.10.14 inference 0.10.3+11.gda1b62c

Description: I need to run the rerank model on the cpu, so I introduced the 'device' parameter in fetch, but it will not work as well as the embedding model.

-------------------error----------------------

2024-05-07 10:24:07,813 xinference.core.worker 69017 ERROR Failed to load model bge-reranker-large-1-0 Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self._model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self._model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' 2024-05-07 10:24:07,888 xinference.api.restful_api 68848 ERROR [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/api/restful_api.py", line 741, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 892, in launch_builtin_model await _launch_model() File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 856, in _launch_model await _launch_one_model(rep_model_uid) File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 838, in _launch_one_model await worker_ref.launch_builtin_model( File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper async with lock: File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper result = await result File "/root/Xinference/custom_packages/inference/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self._model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self._model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device'

-----------------------end-----------------------

--------------------fetch code-------------------

fetch("http://192.168.100.172:12009/v1/models", { "headers": { "accept": "/", "accept-language": "en-US,en;q=0.9", "content-type": "application/json" }, "referrer": "http://192.168.100.172:12009/ui/", "referrerPolicy": "strict-origin-when-cross-origin", "body": "{"model_uid":null,"model_name":"bge-reranker-large","model_type":"rerank","device":"cpu","replica":1}", "method": "POST", "mode": "cors", "credentials": "include" });

------------------------end-----------------------

wongdi avatar May 07 '24 02:05 wongdi

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version.
  2. The version of xinference you use.
  3. Versions of crucial packages.
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

Python 3.10.14 inference 0.10.3+11.gda1b62c

Description: I need to run the rerank model on the cpu, so I introduced the 'device' parameter in fetch, but it will not work as well as the embedding model.

-------------------error----------------------

2024-05-07 10:24:07,813 xinference.core.worker 69017 ERROR Failed to load model bge-reranker-large-1-0 Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self._model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self._model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' 2024-05-07 10:24:07,888 xinference.api.restful_api 68848 ERROR [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/api/restful_api.py", line 741, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 892, in launch_builtin_model await _launch_model() File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 856, in _launch_model await _launch_one_model(rep_model_uid) File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 838, in _launch_one_model await worker_ref.launch_builtin_model( File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper async with lock: File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper result = await result File "/root/Xinference/custom_packages/inference/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self._model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self._model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device'

-----------------------end-----------------------

--------------------fetch code-------------------

fetch("http://192.168.100.172:12009/v1/models", { "headers": { "accept": "/", "accept-language": "en-US,en;q=0.9", "content-type": "application/json" }, "referrer": "http://192.168.100.172:12009/ui/", "referrerPolicy": "strict-origin-when-cross-origin", "body": "{"model_uid":null,"model_name":"bge-reranker-large","model_type":"rerank","device":"cpu","replica":1}", "method": "POST", "mode": "cors", "credentials": "include" });

------------------------end-----------------------

Replace "device":"cpu" with "n_gpu": null

ChengjieLi28 avatar May 07 '24 02:05 ChengjieLi28

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version.
  2. The version of xinference you use.
  3. Versions of crucial packages.
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here. Python 3.10.14 inference 0.10.3+11.gda1b62c Description: I need to run the rerank model on the cpu, so I introduced the 'device' parameter in fetch, but it will not work as well as the embedding model. -------------------error---------------------- 2024-05-07 10:24:07,813 xinference.core.worker 69017 ERROR Failed to load model bge-reranker-large-1-0 Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self._model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self._model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' 2024-05-07 10:24:07,888 xinference.api.restful_api 68848 ERROR [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/api/restful_api.py", line 741, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 892, in launch_builtin_model await _launch_model() File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 856, in _launch_model await _launch_one_model(rep_model_uid) File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 838, in _launch_one_model await worker_ref.launch_builtin_model( File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper async with lock: File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper result = await result File "/root/Xinference/custom_packages/inference/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self.model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self.model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' -----------------------end----------------------- --------------------fetch code------------------- fetch("http://192.168.100.172:12009/v1/models", { "headers": { "accept": "/", "accept-language": "en-US,en;q=0.9", "content-type": "application/json" }, "referrer": "http://192.168.100.172:12009/ui/", "referrerPolicy": "strict-origin-when-cross-origin", "body": "{"model_uid":null,"model_name":"bge-reranker-large","model_type":"rerank","device":"cpu","replica":1}", "method": "POST", "mode": "cors", "credentials": "include" }); ------------------------end-----------------------

Replace "device":"cpu" with "n_gpu": null

ValueError: [address=0.0.0.0:36170, pid=79661] Currently n_gpu only supports auto.

wongdi avatar May 07 '24 03:05 wongdi

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version.
  2. The version of xinference you use.
  3. Versions of crucial packages.
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here. Python 3.10.14 inference 0.10.3+11.gda1b62c Description: I need to run the rerank model on the cpu, so I introduced the 'device' parameter in fetch, but it will not work as well as the embedding model. -------------------error---------------------- 2024-05-07 10:24:07,813 xinference.core.worker 69017 ERROR Failed to load model bge-reranker-large-1-0 Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self._model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self._model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' 2024-05-07 10:24:07,888 xinference.api.restful_api 68848 ERROR [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/api/restful_api.py", line 741, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 892, in launch_builtin_model await _launch_model() File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 856, in _launch_model await _launch_one_model(rep_model_uid) File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 838, in _launch_one_model await worker_ref.launch_builtin_model( File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper async with lock: File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper result = await result File "/root/Xinference/custom_packages/inference/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self.model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self.model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' -----------------------end----------------------- --------------------fetch code------------------- fetch("http://192.168.100.172:12009/v1/models", { "headers": { "accept": "/", "accept-language": "en-US,en;q=0.9", "content-type": "application/json" }, "referrer": "http://192.168.100.172:12009/ui/", "referrerPolicy": "strict-origin-when-cross-origin", "body": "{"model_uid":null,"model_name":"bge-reranker-large","model_type":"rerank","device":"cpu","replica":1}", "method": "POST", "mode": "cors", "credentials": "include" }); ------------------------end-----------------------

Replace "device":"cpu" with "n_gpu": null

ValueError: [address=0.0.0.0:36170, pid=79661] Currently n_gpu only supports auto.

The request body needs to be a json, the value of the n_gpu should not be a string, corresponding to a None object in python.

ChengjieLi28 avatar May 07 '24 03:05 ChengjieLi28

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version.
  2. The version of xinference you use.
  3. Versions of crucial packages.
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here. Python 3.10.14 inference 0.10.3+11.gda1b62c Description: I need to run the rerank model on the cpu, so I introduced the 'device' parameter in fetch, but it will not work as well as the embedding model. -------------------error---------------------- 2024-05-07 10:24:07,813 xinference.core.worker 69017 ERROR Failed to load model bge-reranker-large-1-0 Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self._model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self._model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' 2024-05-07 10:24:07,888 xinference.api.restful_api 68848 ERROR [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/api/restful_api.py", line 741, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 892, in launch_builtin_model await _launch_model() File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 856, in _launch_model await _launch_one_model(rep_model_uid) File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 838, in _launch_one_model await worker_ref.launch_builtin_model( File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper async with lock: File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper result = await result File "/root/Xinference/custom_packages/inference/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self.model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self.model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' -----------------------end----------------------- --------------------fetch code------------------- fetch("http://192.168.100.172:12009/v1/models", { "headers": { "accept": "/", "accept-language": "en-US,en;q=0.9", "content-type": "application/json" }, "referrer": "http://192.168.100.172:12009/ui/", "referrerPolicy": "strict-origin-when-cross-origin", "body": "{"model_uid":null,"model_name":"bge-reranker-large","model_type":"rerank","device":"cpu","replica":1}", "method": "POST", "mode": "cors", "credentials": "include" }); ------------------------end-----------------------

Replace "device":"cpu" with "n_gpu": null

ValueError: [address=0.0.0.0:36170, pid=79661] Currently n_gpu only supports auto.

The request body needs to be a json, the value of the n_gpu should not be a string, corresponding to a None object in python.


The parameter n_gpu must be greater than 0 and not greater than the number of GPUs: 1 on the machine.

I need to run on the cpu, not the specified gpu.

wongdi avatar May 07 '24 03:05 wongdi

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version.
  2. The version of xinference you use.
  3. Versions of crucial packages.
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here. Python 3.10.14 inference 0.10.3+11.gda1b62c Description: I need to run the rerank model on the cpu, so I introduced the 'device' parameter in fetch, but it will not work as well as the embedding model. -------------------error---------------------- 2024-05-07 10:24:07,813 xinference.core.worker 69017 ERROR Failed to load model bge-reranker-large-1-0 Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self._model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self._model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' 2024-05-07 10:24:07,888 xinference.api.restful_api 68848 ERROR [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/api/restful_api.py", line 741, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 892, in launch_builtin_model await _launch_model() File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 856, in _launch_model await _launch_one_model(rep_model_uid) File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 838, in _launch_one_model await worker_ref.launch_builtin_model( File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper async with lock: File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper result = await result File "/root/Xinference/custom_packages/inference/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self.model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self.model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' -----------------------end----------------------- --------------------fetch code------------------- fetch("http://192.168.100.172:12009/v1/models", { "headers": { "accept": "/", "accept-language": "en-US,en;q=0.9", "content-type": "application/json" }, "referrer": "http://192.168.100.172:12009/ui/", "referrerPolicy": "strict-origin-when-cross-origin", "body": "{"model_uid":null,"model_name":"bge-reranker-large","model_type":"rerank","device":"cpu","replica":1}", "method": "POST", "mode": "cors", "credentials": "include" }); ------------------------end-----------------------

Replace "device":"cpu" with "n_gpu": null

ValueError: [address=0.0.0.0:36170, pid=79661] Currently n_gpu only supports auto.

The request body needs to be a json, the value of the n_gpu should not be a string, corresponding to a None object in python.

The parameter n_gpu must be greater than 0 and not greater than the number of GPUs: 1 on the machine.

I need to run on the cpu, not the specified gpu.

I have no idea what you are talking about. Last reply:

curl --header "Content-Type: application/json" --request POST --data '{"model_name":"bge-reranker-base","model_type":"rerank", "n_gpu": null}' http://<endpoint>/v1/models

This will launch the model to CPU.

ChengjieLi28 avatar May 07 '24 03:05 ChengjieLi28

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version.
  2. The version of xinference you use.
  3. Versions of crucial packages.
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here. Python 3.10.14 inference 0.10.3+11.gda1b62c Description: I need to run the rerank model on the cpu, so I introduced the 'device' parameter in fetch, but it will not work as well as the embedding model. -------------------error---------------------- 2024-05-07 10:24:07,813 xinference.core.worker 69017 ERROR Failed to load model bge-reranker-large-1-0 Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self._model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self._model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' 2024-05-07 10:24:07,888 xinference.api.restful_api 68848 ERROR [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' Traceback (most recent call last): File "/root/Xinference/custom_packages/inference/xinference/api/restful_api.py", line 741, in launch_model model_uid = await (await self._get_supervisor_ref()).launch_builtin_model( File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive result = await result File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 892, in launch_builtin_model await _launch_model() File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 856, in _launch_model await _launch_one_model(rep_model_uid) File "/root/Xinference/custom_packages/inference/xinference/core/supervisor.py", line 838, in _launch_one_model await worker_ref.launch_builtin_model( File "xoscar/core.pyx", line 284, in __pyx_actor_method_wrapper async with lock: File "xoscar/core.pyx", line 287, in xoscar.core.__pyx_actor_method_wrapper result = await result File "/root/Xinference/custom_packages/inference/xinference/core/utils.py", line 45, in wrapped ret = await func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/worker.py", line 707, in launch_builtin_model await model_ref.load() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send return self._process_result_message(result) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message raise message.as_instanceof_cause() File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send result = await self._run_coro(message.message_id, coro) File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro return await coro File "/root/Xinference/venv/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive return await super().on_receive(message) # type: ignore File "xoscar/core.pyx", line 558, in on_receive raise ex File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive async with self._lock: File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive with debug_async_timeout('actor_lock_timeout', File "xoscar/core.pyx", line 524, in xoscar.core._BaseActor.on_receive result = func(*args, **kwargs) File "/root/Xinference/custom_packages/inference/xinference/core/model.py", line 239, in load self.model.load() File "/root/Xinference/custom_packages/inference/xinference/model/rerank/core.py", line 134, in load self.model = CrossEncoder( TypeError: [address=0.0.0.0:45176, pid=70382] sentence_transformers.cross_encoder.CrossEncoder.CrossEncoder() got multiple values for keyword argument 'device' -----------------------end----------------------- --------------------fetch code------------------- fetch("http://192.168.100.172:12009/v1/models", { "headers": { "accept": "/", "accept-language": "en-US,en;q=0.9", "content-type": "application/json" }, "referrer": "http://192.168.100.172:12009/ui/", "referrerPolicy": "strict-origin-when-cross-origin", "body": "{"model_uid":null,"model_name":"bge-reranker-large","model_type":"rerank","device":"cpu","replica":1}", "method": "POST", "mode": "cors", "credentials": "include" }); ------------------------end-----------------------

Replace "device":"cpu" with "n_gpu": null

ValueError: [address=0.0.0.0:36170, pid=79661] Currently n_gpu only supports auto.

The request body needs to be a json, the value of the n_gpu should not be a string, corresponding to a None object in python.

The parameter n_gpu must be greater than 0 and not greater than the number of GPUs: 1 on the machine.

I need to run on the cpu, not the specified gpu.

I have no idea what you are talking about. Last reply:

curl --header "Content-Type: application/json" --request POST --data '{"model_name":"bge-reranker-base","model_type":"rerank", "n_gpu": null}' http://<endpoint>/v1/models

This will launch the model to CPU.

Thanks bro, the code works, but it still works on the gpu and takes up 1749M of video memory

wongdi avatar May 07 '24 03:05 wongdi

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar Aug 06 '24 19:08 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

github-actions[bot] avatar Aug 12 '24 03:08 github-actions[bot]