fluent-operator process zombie
if fluent-operator process is in zombie status , it can not recover by itself. can not add liveness probe or do something for hearbeat
It doesn't usually cause this problem, can you show me the logs?
hello this is just our test case for DFX 1、find out the fluent-operator docker process [root@k8s-4 ~]# docker ps |grep fluentbit-operator 892bf193483f rvm:5100/kubesphere/fluentbit-operator "/manager" 2 days ago Up 2 days k8s_fluentbit-operator_fluentbit-operator-85855568c6-6ng9f_kubesphere-logging-system_89dfe4d6-a84a-463e-980d-48c88801fe37_0 5c72dc88ab71 rvm:5100/fitcontainer/pause:3.2 "/pause" 2 days ago Up 2 days k8s_POD_fluentbit-operator-85855568c6-6ng9f_kubesphere-logging-system_89dfe4d6-a84a-463e-980d-48c88801fe37_0 2、 check the docker process [root@k8s-4 ~]# docker top 892bf193483f UID PID PPID C STIME TTY TIME CMD 65532 14995 14978 0 Dec13 ? 00:04:25 /manager [root@k8s-4 ~]# [root@k8s-4 ~]# ps aux |grep 14995 65532 14995 0.1 0.0 743776 52756 ? Ssl Dec13 4:25 /manager root 36638 0.0 0.0 112716 960 pts/0 S+ 01:31 0:00 grep --color=auto 14995 [root@k8s-4 ~]# 3、Simulate this zombie scenario [root@k8s-4 ~]# kill -STOP 14995 [root@k8s-4 ~]# [root@k8s-4 ~]# [root@k8s-4 ~]# ps aux |grep 14995 65532 14995 0.1 0.0 743776 52756 ? Tsl Dec13 4:25 /manager root 38195 0.0 0.0 112716 960 pts/0 S+ 01:32 0:00 grep --color=auto 14995 [root@k8s-4 ~]#
4、 our recover benchmark is less than 10min 。after 10min this process still Tsl
@519859716 Currently, No liveness probe added to deployment's YAML
Are you interested in collaborating on this?
It's pleasure to involve in our project . we have put it in our development plan. if it work fine ,i will update it in our project . @wenchajun