server
server copied to clipboard
Python backend status zombie but Tritonserver `v2/health` still return 200 OK
Description
I am using triton vllm backend. I met problem that python backend status zombie but Tritonserver v2/health still return 200 OK. Parent process should also exit?
[root@llmserver-f-llmcs4-9 log]# ps -ef|grep triton
root 657 645 0 May11 pts/0 00:03:25 /opt/tritonserver/bin/tritonserver --model-repository=/usr/local/xxx/bin//triton0/model_repo --http-port=8000 --grpc-port=8001 --log-file=/usr/local/xxx/log/
root 1059 657 0 May11 pts/0 00:19:02 [triton_python_b] <defunct>
root 488571 456879 0 19:24 pts/3 00:00:00 grep --color=auto triton
[root@llmserver-f-llmcs4-9 log]# head -n 5 /proc/1059/status
Name: triton_python_b
State: Z (zombie)
Tgid: 1059
Ngid: 0
Pid: 1059
[root@llmserver-f-llmcs4-9 log]# curl http://127.0.0.1:8002/v2/health -v
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8002 (#0)
> GET /v2/health HTTP/1.1
> Host: 127.0.0.1:8002
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain; charset=utf-8
< Content-Length: 0
<
* Connection #0 to host 127.0.0.1 left intact
Triton Information TRITON_SERVER_VERSION="v2.42.0" TRITON_DEPS_VERSION="r24.01"
Are you using the Triton container or did you build it yourself? I built the container myself.
Expected behavior Triton server should recognize the backend process zombie, then exit triton process or return unhealthy state.
Hi @burling, thanks for filing the issue. Could you please provide the repro steps and the model files so that we can investigate further? I think the python backend stub shouldn't be in zombie states and should be cleaned up if the stub is not healthy. Besides, could you try with the latest Triton version and see if the issue still happens?