The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Note that the issue tracker is NOT the place for general support. Unable to solve this issue as I am using Distributed-compose.yml
@frostyplanet @qinxuye @bufferoverflow Any help?
I've encounter the same problem, do not have any clue for now.
What's your enviroment and related package version? (vllm, pytorch, ray, nvidia-drivers, cuda ).
vllm will print error into stdout (but do not log to xinference.log). Could you look for error on the screen output (or docker logs) before the exception ? Is there anything simular to this :
This is a Ray error. The Ray actor has crashed. Possible root causes:
- The worker ran out of memory (Ray OOM monitor killed the actor when the free memory was low).
- Some packages in the worker crashed during inferencing.
Please provide details for investigating, including the xinference log and the Ray log (located at /tmp/ray/session_latest/logs). If possible, please export RAY_BACKEND_LOG_LEVEL=debug before launching xinference.
You can also work around this issue by not using vLLM: export XINFERENCE_DISABLE_VLLM=1.
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.