api-umbrella
api-umbrella copied to clipboard
Zombie processes in api-umbrella docker container
Hello,
My company is using api-umbrella on docker (nrel/api-umbrella:0.15.1) on top of ubuntu (16.04.5) in multiple environments.
I noticed a strange behavior in those environments. Approximately every 2 minutes, something in the docker container of api-umbrella is creating a zombie processes.
Below is the api-umbrella docker container
docker ps | grep api-umbrella
70197abf50c4 nrel/api-umbrella:0.15.1 "api-umbrella run" 2 months ago Up 25 hours 0.0.0.0:8082->80/tcp, 0.0.0.0:4432->443/tcp frontend-apiumbrella
Below is an extract of the processes that are currently running on one of these virtual machines.
ps -ajfx
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
0 2 0 0 ? -1 S 0 0:00 [kthreadd]
# Many unrelated lines omitted for brevity
1 1445 1445 1445 ? -1 Ssl 0 2:47 /usr/bin/dockerd -H fd://
1445 1860 1860 1860 ? -1 Ssl 0 0:53 \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/ru
1860 2431 2431 1860 ? -1 Sl 0 0:00 | \_ docker-containerd-shim 70197abf50c4b5ef189bd91b55306dd2a19ead9e4f969b69c2969b3687abadb3 /var/run/docker/libcontainerd/70197abf50c4b5ef18
2431 2523 2523 2523 ? -1 Ss 0 0:00 | | \_ perl /opt/api-umbrella/embedded/bin/resty /opt/api-umbrella/embedded/apps/core/current/bin/api-umbrella-cli run
2523 2752 2523 2523 ? -1 S 0 0:00 | | \_ api-umbrella /opt/api-umbrella/etc/perp
2752 2803 2523 2523 ? -1 S 999 0:03 | | | \_ svlogd -ttt /opt/api-umbrella/var/log/perpd
2752 2804 2523 2523 ? -1 S 0 0:37 | | | \_ perpd /opt/api-umbrella/etc/perp
2804 2890 2890 2890 ? -1 Ss 999 0:00 | | | \_ svlogd -ttt /opt/api-umbrella/var/log/trafficserver
2804 2891 2891 2891 ? -1 Ssl 999 0:36 | | | \_ traffic_manager --nosyslog
2891 3172 2891 2891 ? -1 Sl 999 39:21 | | | | \_ /opt/api-umbrella/embedded/bin/traffic_server -M --httpport 14009:fd=7
2804 2892 2892 2892 ? -1 Ss 999 0:00 | | | \_ svlogd -ttt /opt/api-umbrella/var/log/nginx
2804 2893 2893 2893 ? -1 Ss 0 0:00 | | | \_ nginx: master process nginx -p /opt/api-umbrella/embedded/apps/core/current/ -c /opt/api-umbrella/etc/nginx/router.conf
2893 3160 2893 2893 ? -1 S 999 3:17 | | | | \_ nginx: worker process
2893 3161 2893 2893 ? -1 S 999 2:52 | | | | \_ nginx: worker process
2804 2894 2894 2894 ? -1 Ss 999 0:00 | | | \_ svlogd -ttt /opt/api-umbrella/var/log/web-delayed-job
2804 2895 2895 2895 ? -1 Ssl 999 0:56 | | | \_ ./bin/delayed_job --pid-dir=/opt/api-umbrella/var/run run
2804 2896 2896 2896 ? -1 Ss 999 0:00 | | | \_ svlogd -ttt /opt/api-umbrella/var/log/web-puma
2804 2897 2897 2897 ? -1 Ssl 999 0:03 | | | \_ puma 3.12.1 (unix:///opt/api-umbrella/var/run/web-puma.sock) [web-app]
2897 3373 2897 2897 ? -1 Sl 999 0:14 | | | | \_ puma: cluster worker 0: 89 [web-app]
2897 3375 2897 2897 ? -1 Sl 999 0:14 | | | | \_ puma: cluster worker 1: 89 [web-app]
2804 2898 2898 2898 ? -1 Ss 999 0:02 | | | \_ svlogd -ttt /opt/api-umbrella/var/log/geoip-auto-updater
2804 2900 2900 2900 ? -1 Ss 999 0:00 | | | \_ svlogd -ttt /opt/api-umbrella/var/log/mongod
2804 2901 2901 2901 ? -1 Ssl 999 8:41 | | | \_ mongod --config /opt/api-umbrella/etc/mongod.conf
2804 2902 2902 2902 ? -1 Ss 999 0:00 | | | \_ svlogd -ttt /opt/api-umbrella/var/log/elasticsearch
2804 2903 2903 2903 ? -1 Ssl 999 6:08 | | | \_ /usr/bin/java -Xms512m -Xmx512m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccup
2804 2904 2904 2904 ? -1 Ss 999 0:10 | | | \_ svlogd -ttt /opt/api-umbrella/var/log/rsyslog
2804 2906 2906 2906 ? -1 Ss 999 0:00 | | | \_ svlogd -ttt /opt/api-umbrella/var/log/mora
2804 2907 2907 2907 ? -1 Ssl 999 9:18 | | | \_ mora -config /opt/api-umbrella/etc/mora.properties
2804 18307 18307 18307 ? -1 Ss 0 0:00 | | | \_ perpd /opt/api-umbrella/etc/perp
2804 18312 18312 18312 ? -1 Ss 0 0:00 | | | \_ bash /opt/api-umbrella/embedded/apps/core/current/bin/api-umbrella-geoip-auto-updater
8312 18315 18312 18312 ? -1 S 0 0:00 | | | \_ curl --silent --show-error --fail --location --retry 3 --output /tmp/api-umbrella-geoip-auto-updater.mDomZek8N3.gz h
2523 3523 2899 2899 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 3719 3594 3594 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 5258 5202 5202 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 6506 6460 6460 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 6836 6793 6793 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 7069 7026 7026 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 8998 8954 8954 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 9696 9653 9653 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 9939 9892 9892 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 10854 10811 10811 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 11042 10999 10999 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 11649 11606 11606 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 11838 11795 11795 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 13014 12970 12970 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 13291 13245 13245 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 13621 13578 13578 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 14561 14510 14510 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 14606 14563 14563 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 14886 14838 14838 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 15169 15123 15123 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 15259 15216 15216 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 15680 15634 15634 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 15964 15919 15919 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 16717 16672 16672 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 17002 16955 16955 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 17048 17005 17005 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 17650 17607 17607 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 18263 18219 18219 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 18452 18409 18409 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 19018 18975 18975 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 19486 19443 19443 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 19914 19861 19861 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 20663 20617 20617 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 20993 20950 20950 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 21645 21602 21602 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 22442 22399 22399 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 22676 22633 22633 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 22865 22817 22817 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 23521 23475 23475 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 24414 24370 24370 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 24791 24748 24748 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 24936 24885 24885 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 26575 26532 26532 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 27138 27095 27095 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 27463 27418 27418 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 27556 27513 27513 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 27648 27605 27605 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 27981 27937 27937 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 28026 27983 27983 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 28638 28595 28595 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 29063 29018 29018 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 29110 29065 29065 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 29996 29953 29953 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 31687 31636 31636 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 884 838 838 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 1764 1718 1718 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 2857 2814 2814 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 2919 2859 2859 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 4127 4082 4082 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 4269 4226 4226 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 5065 5021 5021 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 5303 5257 5257 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 5865 5822 5822 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 5919 5867 5867 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 6055 6012 6012 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 7830 7784 7784 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 7976 7925 7925 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 8672 8629 8629 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 8717 8674 8674 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 8771 8719 8719 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 9285 9242 9242 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 9515 9472 9472 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 9571 9517 9517 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 9997 9954 9954 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 10134 10089 10089 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 10886 10842 10842 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 13053 13009 13009 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 14045 13999 13999 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 14608 14564 14564 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
2523 15078 15035 15035 ? -1 Z 0 0:00 | | \_ [bash] <defunct>
# Many more lines with defunct processes as above
# Many unrelated lines omitted for brevity
I checked in 3 enviroments that I have this setup and the perl process in the api-umbrella container has 987, 1328 and 15465 child processes and counting.
Eventually the linux vm runs out of processes and has to be rebooted.
Any help or clue about this issue would be much appreciated.
I investigated the issue and found out that this issue is not happening on the few environments that I have where we don't use docker, but where api-umbrella is directly installed on the vm.
In case somebody also has the issue and is interrested in a solution, for now, I apply a workaround which is to
- monitor the number of defunct children of the api-umbrella container every hour to keep track of things
- restart the api-umbrella in the docker container every day to keep things under control
I am using the following commands to monitor the number of defunct child process
apiUmbrellaPid=$(ps -aux | grep perl | grep api-umbrella | awk '{print $2}')
defunctChildCount=$(pgrep -P $apiUmbrellaPid | xargs ps -p | awk '{print $3}' | grep 'Z' | wc -l)
echo $defunctChildCount
I am using the following command to restart the api-umbrella
docker exec my-apiumbrella-docker-container api-umbrella restart