lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

ERROR - engine.py:781 - Task <EngineMainLoop> failed

Open KDD2018 opened this issue 8 months ago • 2 comments

I get an error when I did an inference with InternVL3-78B-AWQ model. How to solve it ?

lmdeploy==0.7.3

`model_path = "/home/ai-admin/llm-models/InternVL3-78B-AWQ" image_path = "./downloads"

prompts = [] for file in os.listdir(image_path)[:2]: image = load_image(os.path.join(image_path, file)) prompts.append(('请详细描述图片内容。', image))

pipe = pipeline( model_path, backend_config=PytorchEngineConfig(session_len=16384, tp=2), chat_template_config=ChatTemplateConfig(model_name='internvl2_5') ) response = pipe(prompts)`

ERROR:

` 2025-04-21 10:43:58,013 - lmdeploy - ERROR - engine.py:781 - Task <EngineMainLoop> failed Traceback (most recent call last): File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 827, in async_loop await self._async_loop_main(resp_que=resp_que, has_runable_event=has_runable_event) File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 762, in _async_loop_main out = await self.executor.get_output_async() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 339, in get_output_async return await self.remote_outs.get() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/asyncio/queues.py", line 159, in get await getter asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 776, in __task_callback task.result() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 829, in async_loop self._loop_finally() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 793, in _loop_finally self.executor.release() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 308, in release self.collective_rpc('exit') File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in collective_rpc return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout) File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout) File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/ray/actor.py", line 1549, in getattr raise AttributeError( AttributeError: 'ActorHandle' object has no attribute 'exit' unhandled exception during worker thread shutdown task: <Task finished name='EngineMainLoop' coro=<Engine.async_loop() done, defined at /home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py:795> exception=AttributeError("'ActorHandle' object has no attribute 'exit'")> Traceback (most recent call last): File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 827, in async_loop await self._async_loop_main(resp_que=resp_que, has_runable_event=has_runable_event) File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 762, in _async_loop_main out = await self.executor.get_output_async() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 339, in get_output_async return await self.remote_outs.get() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/asyncio/queues.py", line 159, in get await getter asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 776, in __task_callback task.result() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 829, in async_loop self._loop_finally() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 793, in _loop_finally self.executor.release() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 308, in release self.collective_rpc('exit') File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in collective_rpc return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout) File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout) File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/ray/actor.py", line 1549, in getattr raise AttributeError( AttributeError: 'ActorHandle' object has no attribute 'exit' (RayWorkerWrapper pid=451466) loc("/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/kernels/cuda/pagedattention.py":207:11): error: operation scheduled before its operands Future exception was never retrieved future: <Future finished exception=AttributeError("'ActorHandle' object has no attribute 'exit'")> Traceback (most recent call last): File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 827, in async_loop await self._async_loop_main(resp_que=resp_que, has_runable_event=has_runable_event) File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 762, in _async_loop_main out = await self.executor.get_output_async() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 339, in get_output_async return await self.remote_outs.get() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/asyncio/queues.py", line 159, in get await getter asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 776, in __task_callback task.result() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 829, in async_loop self._loop_finally() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 793, in _loop_finally self.executor.release() File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 308, in release self.collective_rpc('exit') File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in collective_rpc return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout) File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout) File "/home/ai-admin/.conda/envs/lmdeploy/lib/python3.10/site-packages/ray/actor.py", line 1549, in getattr raise AttributeError( AttributeError: 'ActorHandle' object has no attribute 'exit' `

KDD2018 avatar Apr 21 '25 02:04 KDD2018

I encountered a similar problem

lmdeploy - ERROR - engine.py:950 - Task <EngineMainLoop> failed
Traceback (most recent call last):
File "llm_CoT/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 945, in __task_callback task.result()
File "llm_CoT/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 998, in async_loop
await self._async_loop_main(resp_que=resp_que,
File "llm_CoT/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 922, in _async_loop_main forward_inputs, next_running = await inputs_maker.send_next_inputs()

ccccwb avatar Apr 22 '25 11:04 ccccwb

Get the same error Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 776, in __task_callback task.result() File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 829, in async_loop self._loop_finally() File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 793, in _loop_finally self.executor.release() File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 308, in release self.collective_rpc('exit') File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in collective_rpc return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout) File "/usr/local/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/ray_executor.py", line 243, in return ray.get([getattr(worker, method).remote(*args, **kwargs) for worker in self.workers], timeout=timeout) File "/usr/local/lib/python3.10/site-packages/ray/actor.py", line 1549, in getattr raise AttributeError( AttributeError: 'ActorHandle' object has no attribute 'exit'

hufangjian avatar Apr 28 '25 06:04 hufangjian