inference icon indicating copy to clipboard operation
inference copied to clipboard

The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.

Open insistence-essenn opened this issue 1 year ago • 2 comments

Note that the issue tracker is NOT the place for general support. Unable to solve this issue as I am using Distributed-compose.yml

insistence-essenn avatar May 09 '24 11:05 insistence-essenn

@frostyplanet @qinxuye @bufferoverflow Any help?

insistence-essenn avatar May 14 '24 11:05 insistence-essenn

I've encounter the same problem, do not have any clue for now. What's your enviroment and related package version? (vllm, pytorch, ray, nvidia-drivers, cuda ). vllm will print error into stdout (but do not log to xinference.log). Could you look for error on the screen output (or docker logs) before the exception ? Is there anything simular to this : ray_error_libgomp_20240514-144338

frostyplanet avatar May 15 '24 08:05 frostyplanet

This is a Ray error. The Ray actor has crashed. Possible root causes:

  1. The worker ran out of memory (Ray OOM monitor killed the actor when the free memory was low).
  2. Some packages in the worker crashed during inferencing.

Please provide details for investigating, including the xinference log and the Ray log (located at /tmp/ray/session_latest/logs). If possible, please export RAY_BACKEND_LOG_LEVEL=debug before launching xinference.

You can also work around this issue by not using vLLM: export XINFERENCE_DISABLE_VLLM=1.

codingl2k1 avatar May 22 '24 19:05 codingl2k1

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar Aug 06 '24 19:08 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

github-actions[bot] avatar Aug 12 '24 03:08 github-actions[bot]