wget zombie processes
Current Behavior
Today I saw this
=> There are 2 zombie processes.
after login to my system. This is new. I don't have any zombie processes on my system.
ps shows
$ ps aux | grep 'Z'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
xxxxxx 6717 0.0 0.0 0 0 ? Z 20:17 0:00 [wget] <defunct>
xxxxxx 7857 0.0 0.0 0 0 ? Z 20:18 0:00 [wget] <defunct>
next step was
├─containerd-shim(3136)─┬─java(3940)─┬─wget(6717)
│ │ ├─wget(7857)
│ │ ├─{java}(4781)
│ │ ├─{java}(4812)
│ │ ├─{java}(4884)
│ │ ├─{java}(4893)
And
$ ps aux | grep 3940
xxxxxx 3940 19.9 2.6 15333688 849660 ? Ssl 20:16 1:49 java -XX:+UseParallelGC -XX:MaxRAMPercentage=90.0 --add-opens java.base/java.util.concurrent=ALL-UNNAMED -Dlogback.configurationFile=logback.xml -DdependencyTrack.logging.level=DEBUG -jar dependency-track-apiserver.jar -context /
xxxxxx 16571 0.0 0.0 6608 2432 pts/0 S+ 20:26 0:00 grep --color=auto 3940
After I stopped the DT container zombie processes are gone. And DT was in idle. After reboot my system I saw the same.
What happens here? Why the zombie processes?
My docker-compose:
version: '3.7'
services:
dtrack-apiserver:
image: dependencytrack/apiserver
environment:
- TZ=Europe/Berlin
# Database Properties
- ALPINE_DATABASE_MODE=external
- ALPINE_DATABASE_URL=jdbc:postgresql://xx.xx.xx.xx:5432/dtrack
- ALPINE_DATABASE_DRIVER=org.postgresql.Driver
- ALPINE_DATABASE_USERNAME=dtrack
- ALPINE_DATABASE_PASSWORD=xxx
- ALPINE_DATABASE_POOL_ENABLED=true
- ALPINE_DATABASE_POOL_MAX_SIZE=10
- ALPINE_DATABASE_POOL_MIN_IDLE=2
- ALPINE_DATABASE_POOL_IDLE_TIMEOUT=300000
- ALPINE_DATABASE_POOL_MAX_LIFETIME=600000
- LOGGING_LEVEL=DEBUG
deploy:
resources:
limits:
memory: 12288m
reservations:
memory: 8192m
restart_policy:
condition: on-failure
ports:
- '7071:8080'
volumes:
- "/data-files/data/docker/dependency-track:/data"
- "/etc/timezone:/etc/timezone:ro"
- "/etc/localtime:/etc/localtime:ro"
restart: unless-stopped
dtrack-frontend:
image: dependencytrack/frontend
depends_on:
- dtrack-apiserver
environment:
- TZ=Europe/Berlin
- API_BASE_URL=http://xx.xx.xx.xx:7071
volumes:
- "/etc/timezone:/etc/timezone:ro"
- "/etc/localtime:/etc/localtime:ro"
ports:
- "7070:8080"
restart: unless-stopped
Steps to Reproduce
Just start DT in docker and check the system.
Expected Behavior
No zombie processes.
Dependency-Track Version
4.9.1
Dependency-Track Distribution
Container Image
Database Server
PostgreSQL
Database Server Version
16
Browser
Google Chrome
Checklist
- [X] I have read and understand the contributing guidelines
- [X] I have checked the existing issues for whether this defect was already reported
The first zombie process was 10hours old. After reboot I had two zombie processes.
And now only one after I started the docker DT container again.
$ ps aux | grep 'Z'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
xxxxxx 48069 0.0 0.0 0 0 ? Z 21:03 0:00 [wget] <defunct>
In dependencytrack/apiserver container with more /proc/<pid>/status I can see
Name: wget
State: Z (zombie)
Tgid: 65
Ngid: 0
Pid: 65
PPid: 1
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 0
Groups: 1000
NStgid: 65
NSpid: 65
NSpgid: 59
NSsid: 59
Threads: 1
SigQ: 0/126315
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 0000000008000201
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 2
Seccomp_filters: 1
Speculation_Store_Bypass: thread vulnerable
SpeculationIndirectBranch: conditional enabled
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000
00,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 2
nonvoluntary_ctxt_switches: 6
wget is used for the container's health check: https://github.com/DependencyTrack/dependency-track/blob/d8464427fa993404961563e97bdd5f2564b4f7ce/src/main/docker/Dockerfile#L76
thx. But it looks like this will be not finished. And at the end it should be not detected as zombie process.
Right?
So, any idea why this would happen? I'd expect the container runtime to ensure that health check processes are properly terminated. The command we use has a timeout of 3s, there is nothing in it that should cause it to stay around for longer than that.
Hm ... perhaps add tmeout to wget too?
wget --connect-timeout=5 htt
or add
HEALTHCHECK --interval=30s --timeout=3s --start-period=XXs
or switch to curl
HEALTHCHECK --interval=30s --timeout=3s --start-period=15s CMD curl --fail localhost:8080/health || exit 1
You see this zombie process in your environment too?
I this I found the problem. On my NUC it needs longer.
I started the container and than I call this:
$ time wget http://192.168.178.30:7071/health
--2023-11-27 22:07:45-- http://192.168.178.30:7071/health
Connecting to 192.168.178.30:7071... connected.
HTTP request sent, awaiting response... 200 OK
Length: 124 [application/json]
Saving to: ‘health’
health 100%[=====================>] 124 --.-KB/s in 0s
2023-11-27 22:08:21 (8,88 MB/s) - ‘health’ saved [124/124]
real 0m36,389s
user 0m0,001s
sys 0m0,007s
I called this from outside the docker container. How can I see in the log, that the health endpoint is available?
You see this zombie process in your environment too?
Nope. My production systems run in k8s which invokes health checks from outside the container. I am also not seeing this locally on my laptop.
How can I see in the log, that the health endpoint is available?
All HTTP endpoints including /health will be available once this is logged:
INFO [AlpineServlet] Dependency-Track is ready
I think adding a timeout to the wget command itself, and adding --start-period 60s are valid additions though.
Thx. You should increase the values or make this configurable in docker-compose.
Start time
Finish start
Not everyone has an Threadripper. ;)
You should increase the values or make this configurable in docker-compose.
It already is configurable: https://docs.docker.com/compose/compose-file/compose-file-v3/#healthcheck
Not everyone has an Threadripper. ;)
Neither do I :)
Thx. I didn't know that. Now I added
healthcheck:
#disable: true
test: wget --no-verbose --tries=1 --spider http://127.0.0.1:8080/health || exit 1
interval: 2m
timeout: 3s
retries: 3
start_period: 15s
And it looks better now. I can't see the zombie process.
I've faced the same issue with zombies after updating from 4.8.1 to 4.9.1. The difference in the healthchecks:
4.8.1
HEALTHCHECK &{["CMD-SHELL" "wget --no-proxy -q -O /dev/null http://127.0.0.1:8080${CONTEXT}health || exit 1"] "30s" "3s" "0s" '\x00'}
4.9.1
HEALTHCHECK &{["CMD-SHELL" "wget --no-proxy -q -O /dev/null http://127.0.0.1:8080${CONTEXT}health || exit 1"] "30s" "3s" "0s" "0s" '\x00'}
I updated today to 4.9.1 and I removed my fix to test https://github.com/DependencyTrack/dependency-track/pull/3245 and I see the zombie wget process again. The fix is not working.
4.12.0 still have this issue
ps aux | grep 'Z'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000 2562281 0.0 0.0 0 0 ? Z Oct06 0:00 [wget]
I see the same
xxx 1571 0.0 0.0 0 0 ? Z Okt05 0:00 \_ [wget] <defunct>
xxx 1618 0.0 0.0 0 0 ? Z Okt05 0:00 \_ [wget] <defunct>
with 4.12.0
any update on this?
▶ ps aux | grep Z
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000 16789 0.0 0.0 0 0 ? Z 14:59 0:00 [wget] <defunct>
▶ pstree -p -s 16789
systemd(1)───containerd-shim(16312)───java(16333)───wget(16789)
▶ ps aux | grep 16333
1000 16333 44.6 1.2 21205388 787792 ? Ssl 14:59 2:02 java -XX:+UseParallelGC -XX:+UseStringDeduplication -XX:MaxRAMPercentage=90.0 --add-opens java.base/java.util.concurrent=ALL-UNNAMED -Dlogback.configurationFile=logback.xml -DdependencyTrack.logging.level=INFO -jar dependency-track-apiserver.jar -context /
This still seems to be an issue on recent versions (all the ones I remember from the last year or so).
Yes! 😅
adab7254eeb4 dependencytrack/frontend:4.13.2 "/docker-entrypoint.…" 4 weeks ago Up 41 hours 8080/tcp dependencytrack-dtrack-frontend-1
b93ac9e8bb19 dependencytrack/apiserver:4.13.2 "/bin/sh -c 'exec ja…" 4 weeks ago Up 41 hours (healthy) 8080/tcp dependencytrack-dtrack-apiserver-1
~ # ps aux | grep Z
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000 3841049 0.0 0.0 0 0 ? Z Jun30 0:00 [wget] <defunct>
1000 3841336 0.0 0.0 0 0 ? Z Jun30 0:00 [wget] <defunct>
~ # pstree -p -s 3841049
systemd(1)───containerd-shim(3831315)───java(3831441)───wget(3841049)
Do the deptrack containers contain an init that reaps zombie processes? Something like https://github.com/Yelp/dumb-init
@taladar i guess the zombie processes are coming from the apiserver. When i inspect the image i can just see that, so no there is no init or something.
"Cmd": [
"/bin/sh",
"-c",
"exec java ${JAVA_OPTIONS} ${EXTRA_JAVA_OPTIONS} --add-opens java.base/java.util.concurrent=ALL-UNNAMED -Dlogback.configurationFile=${LOGGING_CONFIG_PATH} -DdependencyTrack.logging.level=${LOGGING_LEVEL} -jar ${WAR_FILENAME} -context ${CONTEXT}"
],
It's still unclear to me what the root cause is. Anyone willing to raise a PR with a proposed fix?
Well, not sure if this is the problem here but zombie processes are essentially process data structures still present in the kernel after the process has exited so the parent process can call waitpid on them. If the parent process never does before it terminates itself the processes are handed up the chain to the parent process of the parent process, eventually ending up at the init process. Init processes normally have some sort of logic to reap those inherited zombie processes but inside a container the PID 1 process is often a regular process which lacks that special bit of code. dumbinit and similar tools are meant to be called as a wrapper around the actual process in the container and handle that task.
The issue is still present with Dependency-Track v4.13.6
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
xxx 723746 0.0 0.0 0 0 ? Z Nov27 0:00 [wget] <defunct>
xxx 924724 0.0 0.0 0 0 ? Z Nov28 0:00 [wget] <defunct>
xxx 925798 0.0 0.0 0 0 ? Z Nov28 0:00 [wget] <defunct>
xxx 1447075 0.0 0.0 0 0 ? Z Nov29 0:00 [wget] <defunct>
xxx 2789371 0.0 0.0 0 0 ? Z Dec02 0:00 [wget] <defunct>
xxx 2790311 0.0 0.0 0 0 ? Z Dec02 0:00 [wget] <defunct>
xxx 3236153 0.0 0.0 0 0 ? Z Dec03 0:00 [wget] <defunct>
xxx 3681485 0.0 0.0 0 0 ? Z 01:51 0:00 [wget] <defunct>