dependency-track icon indicating copy to clipboard operation
dependency-track copied to clipboard

wget zombie processes

Open somera opened this issue 2 years ago • 23 comments

Current Behavior

Today I saw this

=> There are 2 zombie processes.

after login to my system. This is new. I don't have any zombie processes on my system.

ps shows

$ ps aux | grep 'Z'
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
xxxxxx      6717  0.0  0.0      0     0 ?        Z    20:17   0:00 [wget] <defunct>
xxxxxx      7857  0.0  0.0      0     0 ?        Z    20:18   0:00 [wget] <defunct>

next step was

           ├─containerd-shim(3136)─┬─java(3940)─┬─wget(6717)
           │                       │            ├─wget(7857)
           │                       │            ├─{java}(4781)
           │                       │            ├─{java}(4812)
           │                       │            ├─{java}(4884)
           │                       │            ├─{java}(4893)

And

$ ps aux | grep 3940
xxxxxx      3940 19.9  2.6 15333688 849660 ?     Ssl  20:16   1:49 java -XX:+UseParallelGC -XX:MaxRAMPercentage=90.0 --add-opens java.base/java.util.concurrent=ALL-UNNAMED -Dlogback.configurationFile=logback.xml -DdependencyTrack.logging.level=DEBUG -jar dependency-track-apiserver.jar -context /
xxxxxx     16571  0.0  0.0   6608  2432 pts/0    S+   20:26   0:00 grep --color=auto 3940

After I stopped the DT container zombie processes are gone. And DT was in idle. After reboot my system I saw the same.

What happens here? Why the zombie processes?

My docker-compose:

version: '3.7'

services:
  dtrack-apiserver:
    image: dependencytrack/apiserver
    environment:
      - TZ=Europe/Berlin
      # Database Properties
      - ALPINE_DATABASE_MODE=external
      - ALPINE_DATABASE_URL=jdbc:postgresql://xx.xx.xx.xx:5432/dtrack
      - ALPINE_DATABASE_DRIVER=org.postgresql.Driver
      - ALPINE_DATABASE_USERNAME=dtrack
      - ALPINE_DATABASE_PASSWORD=xxx
      - ALPINE_DATABASE_POOL_ENABLED=true
      - ALPINE_DATABASE_POOL_MAX_SIZE=10
      - ALPINE_DATABASE_POOL_MIN_IDLE=2
      - ALPINE_DATABASE_POOL_IDLE_TIMEOUT=300000
      - ALPINE_DATABASE_POOL_MAX_LIFETIME=600000

      - LOGGING_LEVEL=DEBUG
    deploy:
      resources:
        limits:
          memory: 12288m
        reservations:
          memory: 8192m
      restart_policy:
        condition: on-failure
    ports:
      - '7071:8080'
    volumes:
      - "/data-files/data/docker/dependency-track:/data"
      - "/etc/timezone:/etc/timezone:ro"
      - "/etc/localtime:/etc/localtime:ro"
    restart: unless-stopped

  dtrack-frontend:
    image: dependencytrack/frontend
    depends_on:
      - dtrack-apiserver
    environment:
      - TZ=Europe/Berlin
      - API_BASE_URL=http://xx.xx.xx.xx:7071
    volumes:
      - "/etc/timezone:/etc/timezone:ro"
      - "/etc/localtime:/etc/localtime:ro"
    ports:
      - "7070:8080"
    restart: unless-stopped

Steps to Reproduce

Just start DT in docker and check the system.

Expected Behavior

No zombie processes.

Dependency-Track Version

4.9.1

Dependency-Track Distribution

Container Image

Database Server

PostgreSQL

Database Server Version

16

Browser

Google Chrome

Checklist

somera avatar Nov 27 '23 19:11 somera

The first zombie process was 10hours old. After reboot I had two zombie processes.

And now only one after I started the docker DT container again.

$ ps aux | grep 'Z'
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
xxxxxx     48069  0.0  0.0      0     0 ?        Z    21:03   0:00 [wget] <defunct>

somera avatar Nov 27 '23 20:11 somera

In dependencytrack/apiserver container with more /proc/<pid>/status I can see

Name:   wget
State:  Z (zombie)
Tgid:   65
Ngid:   0
Pid:    65
PPid:   1
TracerPid:      0
Uid:    1000    1000    1000    1000
Gid:    1000    1000    1000    1000
FDSize: 0
Groups: 1000
NStgid: 65
NSpid:  65
NSpgid: 59
NSsid:  59
Threads:        1
SigQ:   0/126315
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 0000000008000201
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
NoNewPrivs:     0
Seccomp:        2
Seccomp_filters:        1
Speculation_Store_Bypass:       thread vulnerable
SpeculationIndirectBranch:      conditional enabled
Cpus_allowed:   f
Cpus_allowed_list:      0-3
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000
00,00000000,00000000,00000001
Mems_allowed_list:      0
voluntary_ctxt_switches:        2
nonvoluntary_ctxt_switches:     6

somera avatar Nov 27 '23 20:11 somera

wget is used for the container's health check: https://github.com/DependencyTrack/dependency-track/blob/d8464427fa993404961563e97bdd5f2564b4f7ce/src/main/docker/Dockerfile#L76

nscuro avatar Nov 27 '23 20:11 nscuro

thx. But it looks like this will be not finished. And at the end it should be not detected as zombie process.

Right?

somera avatar Nov 27 '23 20:11 somera

So, any idea why this would happen? I'd expect the container runtime to ensure that health check processes are properly terminated. The command we use has a timeout of 3s, there is nothing in it that should cause it to stay around for longer than that.

nscuro avatar Nov 27 '23 20:11 nscuro

Hm ... perhaps add tmeout to wget too?

wget --connect-timeout=5 htt

or add

HEALTHCHECK --interval=30s --timeout=3s --start-period=XXs

or switch to curl

HEALTHCHECK --interval=30s --timeout=3s --start-period=15s CMD curl --fail localhost:8080/health || exit 1

You see this zombie process in your environment too?

somera avatar Nov 27 '23 20:11 somera

I this I found the problem. On my NUC it needs longer.

I started the container and than I call this:

$ time wget http://192.168.178.30:7071/health
--2023-11-27 22:07:45--  http://192.168.178.30:7071/health
Connecting to 192.168.178.30:7071... connected.
HTTP request sent, awaiting response... 200 OK
Length: 124 [application/json]
Saving to: ‘health’

health                                                               100%[=====================>]     124  --.-KB/s    in 0s

2023-11-27 22:08:21 (8,88 MB/s) - ‘health’ saved [124/124]


real    0m36,389s
user    0m0,001s
sys     0m0,007s

I called this from outside the docker container. How can I see in the log, that the health endpoint is available?

somera avatar Nov 27 '23 21:11 somera

You see this zombie process in your environment too?

Nope. My production systems run in k8s which invokes health checks from outside the container. I am also not seeing this locally on my laptop.

How can I see in the log, that the health endpoint is available?

All HTTP endpoints including /health will be available once this is logged:

INFO [AlpineServlet] Dependency-Track is ready

I think adding a timeout to the wget command itself, and adding --start-period 60s are valid additions though.

nscuro avatar Nov 27 '23 22:11 nscuro

Thx. You should increase the values or make this configurable in docker-compose.

Start time image

Finish start image

Not everyone has an Threadripper. ;)

somera avatar Nov 27 '23 22:11 somera

You should increase the values or make this configurable in docker-compose.

It already is configurable: https://docs.docker.com/compose/compose-file/compose-file-v3/#healthcheck

Not everyone has an Threadripper. ;)

Neither do I :)

nscuro avatar Nov 27 '23 22:11 nscuro

Thx. I didn't know that. Now I added

    healthcheck:
      #disable: true
      test: wget --no-verbose --tries=1 --spider http://127.0.0.1:8080/health || exit 1
      interval: 2m
      timeout: 3s
      retries: 3
      start_period: 15s

And it looks better now. I can't see the zombie process.

somera avatar Nov 27 '23 22:11 somera

I've faced the same issue with zombies after updating from 4.8.1 to 4.9.1. The difference in the healthchecks:

4.8.1
HEALTHCHECK &{["CMD-SHELL" "wget --no-proxy -q -O /dev/null http://127.0.0.1:8080${CONTEXT}health || exit 1"] "30s" "3s" "0s" '\x00'}

4.9.1
HEALTHCHECK &{["CMD-SHELL" "wget --no-proxy -q -O /dev/null http://127.0.0.1:8080${CONTEXT}health || exit 1"] "30s" "3s" "0s" "0s" '\x00'}

gray380 avatar Dec 04 '23 07:12 gray380

I updated today to 4.9.1 and I removed my fix to test https://github.com/DependencyTrack/dependency-track/pull/3245 and I see the zombie wget process again. The fix is not working.

somera avatar Dec 08 '23 17:12 somera

4.12.0 still have this issue ps aux | grep 'Z' USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 1000 2562281 0.0 0.0 0 0 ? Z Oct06 0:00 [wget] 1000 2592700 0.0 0.0 0 0 ? Z Oct06 0:00 [wget]

grimnir avatar Oct 07 '24 17:10 grimnir

I see the same

xxx      1571  0.0  0.0      0     0 ?        Z    Okt05   0:00      \_ [wget] <defunct>
xxx      1618  0.0  0.0      0     0 ?        Z    Okt05   0:00      \_ [wget] <defunct>

with 4.12.0

somera avatar Oct 07 '24 17:10 somera

any update on this?

▶ ps aux | grep Z
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1000       16789  0.0  0.0      0     0 ?        Z    14:59   0:00 [wget] <defunct>

▶ pstree -p -s 16789
systemd(1)───containerd-shim(16312)───java(16333)───wget(16789)

▶ ps aux | grep 16333
1000       16333 44.6  1.2 21205388 787792 ?     Ssl  14:59   2:02 java -XX:+UseParallelGC -XX:+UseStringDeduplication -XX:MaxRAMPercentage=90.0 --add-opens java.base/java.util.concurrent=ALL-UNNAMED -Dlogback.configurationFile=logback.xml -DdependencyTrack.logging.level=INFO -jar dependency-track-apiserver.jar -context /

steinbrueckri avatar Mar 06 '25 15:03 steinbrueckri

This still seems to be an issue on recent versions (all the ones I remember from the last year or so).

taladar avatar Jul 02 '25 07:07 taladar

Yes! 😅

adab7254eeb4   dependencytrack/frontend:4.13.2      "/docker-entrypoint.…"   4 weeks ago     Up 41 hours             8080/tcp                             dependencytrack-dtrack-frontend-1
b93ac9e8bb19   dependencytrack/apiserver:4.13.2     "/bin/sh -c 'exec ja…"   4 weeks ago     Up 41 hours (healthy)   8080/tcp                             dependencytrack-dtrack-apiserver-1

~ # ps aux | grep Z
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1000     3841049  0.0  0.0      0     0 ?        Z    Jun30   0:00 [wget] <defunct>
1000     3841336  0.0  0.0      0     0 ?        Z    Jun30   0:00 [wget] <defunct>

~ # pstree -p -s 3841049
systemd(1)───containerd-shim(3831315)───java(3831441)───wget(3841049)

steinbrueckri avatar Jul 02 '25 08:07 steinbrueckri

Do the deptrack containers contain an init that reaps zombie processes? Something like https://github.com/Yelp/dumb-init

taladar avatar Jul 02 '25 08:07 taladar

@taladar i guess the zombie processes are coming from the apiserver. When i inspect the image i can just see that, so no there is no init or something.

"Cmd": [
"/bin/sh",
"-c",
"exec java ${JAVA_OPTIONS} ${EXTRA_JAVA_OPTIONS}     --add-opens java.base/java.util.concurrent=ALL-UNNAMED     -Dlogback.configurationFile=${LOGGING_CONFIG_PATH}     -DdependencyTrack.logging.level=${LOGGING_LEVEL}     -jar ${WAR_FILENAME}     -context ${CONTEXT}"
],

steinbrueckri avatar Jul 02 '25 08:07 steinbrueckri

It's still unclear to me what the root cause is. Anyone willing to raise a PR with a proposed fix?

nscuro avatar Jul 02 '25 09:07 nscuro

Well, not sure if this is the problem here but zombie processes are essentially process data structures still present in the kernel after the process has exited so the parent process can call waitpid on them. If the parent process never does before it terminates itself the processes are handed up the chain to the parent process of the parent process, eventually ending up at the init process. Init processes normally have some sort of logic to reap those inherited zombie processes but inside a container the PID 1 process is often a regular process which lacks that special bit of code. dumbinit and similar tools are meant to be called as a wrapper around the actual process in the container and handle that task.

taladar avatar Jul 02 '25 09:07 taladar

The issue is still present with Dependency-Track v4.13.6

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
xxx     723746  0.0  0.0      0     0 ?        Z    Nov27   0:00 [wget] <defunct>
xxx     924724  0.0  0.0      0     0 ?        Z    Nov28   0:00 [wget] <defunct>
xxx     925798  0.0  0.0      0     0 ?        Z    Nov28   0:00 [wget] <defunct>
xxx    1447075  0.0  0.0      0     0 ?        Z    Nov29   0:00 [wget] <defunct>
xxx    2789371  0.0  0.0      0     0 ?        Z    Dec02   0:00 [wget] <defunct>
xxx    2790311  0.0  0.0      0     0 ?        Z    Dec02   0:00 [wget] <defunct>
xxx    3236153  0.0  0.0      0     0 ?        Z    Dec03   0:00 [wget] <defunct>
xxx    3681485  0.0  0.0      0     0 ?        Z    01:51   0:00 [wget] <defunct>

christophs78 avatar Dec 04 '25 12:12 christophs78