umami icon indicating copy to clipboard operation
umami copied to clipboard

Zombie process in/from Docker container

Open mk3media opened this issue 3 years ago • 3 comments

I just noticed a zombie process on my server. After some further investigation I found the course of the problem, the zombie process belongs to the umami docker container. Here is the output of top on the containers console:

Load average: 0.10 0.03 0.01 2/519 282 PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 237 226 nextjs S 20.3g1080% 1 0% /usr/local/bin/node serve 226 28 nextjs S 306m 16% 1 0% /usr/local/bin/node /opt/ 1 0 nextjs S 305m 16% 0 0% node /opt/yarn-v1.22.19/b 28 1 nextjs S 284m 15% 0 0% /usr/local/bin/node /app/ 276 0 nextjs S 1680 0% 0 0% sh 282 276 nextjs R 1608 0% 0 0% top 197 1 nextjs Z 0 0% 1 0% [node]

Same problem is present on two different systems. Any suggestions?

mk3media avatar Aug 14 '22 14:08 mk3media

I got same issue / after upgrading umami version v1.36.1 > v1.37.0

here is an example:

   9370 ?        Sl     0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 6b2d9f052****fefccf7 -address /run/containerd/containerd.sock
   9398 ?        Ssl    0:00  \_ node /opt/yarn-v1.22.19/bin/yarn.js start-docker
   9468 ?        Sl     0:00      \_ /usr/local/bin/node /app/node_modules/.bin/npm-run-all check-db update-tracker start-server
   9828 ?        Sl     0:00      |   \_ /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run start-server
   9839 ?        Sl     0:03      |       \_ /usr/local/bin/node server.js
   9668 ?        Zs     0:00      \_ [node] <defunct>
   9791 ?        Zs     0:00      \_ [node] <defunct>

btw container logs seems ok:

docker logs my_umami
yarn run v1.22.19
$ npm-run-all check-db update-tracker start-server
$ node scripts/check-db.js
✓ DATABASE_URL is defined.
✓ Database connection successful.
✓ Database tables found.
Prisma schema loaded from prisma/schema.prisma
Datasource "db": PostgreSQL database "postgres", schema "public" at "my_pgsql:5432"

2 migrations found in prisma/migrations

Following migration have not yet been applied:
02_add_event_data

To apply migrations in development run yarn prisma migrate dev.
To apply migrations in production run yarn prisma migrate deploy.



Running update...
Prisma schema loaded from prisma/schema.prisma
Datasource "db": PostgreSQL database "postgres", schema "public" at "my_pgsql:5432"

2 migrations found in prisma/migrations

Applying migration `02_add_event_data`

The following migration have been applied:

migrations/
  └─ 02_add_event_data/
    └─ migration.sql

All migrations have been successfully applied.

✓ Database is up to date.
$ node scripts/update-tracker.js
$ node server.js
Listening on port 3000
(...some table json definition..)
  • reboot the VM
  • 1 zombi :'(
   1347 ?        Sl     0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 6b2d9f052****feed7550555441b2acb7fefccf7 -address /run/containerd/containerd.sock
   1421 ?        Ssl    0:00  \_ node /opt/yarn-v1.22.19/bin/yarn.js start-docker
   1899 ?        Sl     0:00      \_ /usr/local/bin/node /app/node_modules/.bin/npm-run-all check-db update-tracker start-server
   2187 ?        Sl     0:00      |   \_ /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run start-server
   2198 ?        Sl     0:03      |       \_ /usr/local/bin/node server.js
   2158 ?        Zs     0:00      \_ [node] <defunct>

FYI: about image content

$ docker exec -it my_umami sh
/app $ npm list -g --depth 0
npm WARN config global `--global`, `--local` are deprecated. Use `--location=global` instead.
/usr/local/lib
+-- [email protected]
`-- [email protected]

/app $ yarn --version
1.22.19

could you tell us a way to troubleshoot? or help you to reproduce ?

boly38 avatar Aug 18 '22 11:08 boly38

Updated to 1.38.0 and the zombie process still remains :(

mk3media avatar Sep 06 '22 16:09 mk3media

zombie still there in 1.38 too.

I tried to change for package.json :: start-docker target the binary npm-run-all(doc) by run-s to run node commands sequentially and try to identify the root cause. Then

  • restart the umami docker
  • then quickly docker exec -it myumami sh (having shell on umami docker container)
  • then repeat ps xaf command

What I see is :

/app $ ps xaf
PID   USER     TIME  COMMAND
    1 nextjs    0:00 node /opt/yarn-v1.22.19/bin/yarn.js start-docker
   27 nextjs    0:00 /usr/local/bin/node /app/node_modules/.bin/run-s check-db update-tracker start-server
   38 nextjs    0:00 /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run check-db
   49 nextjs    0:00 /usr/local/bin/node scripts/check-db.js
   82 nextjs    0:02 /usr/local/bin/node /app/node_modules/.bin/prisma migrate status
   89 nextjs    0:00 sh
  105 nextjs    0:00 [sh]
  106 nextjs    0:00 ps xaf
/app $ ps xaf
PID   USER     TIME  COMMAND
    1 nextjs    0:00 node /opt/yarn-v1.22.19/bin/yarn.js start-docker
   27 nextjs    0:00 /usr/local/bin/node /app/node_modules/.bin/run-s check-db update-tracker start-server
   38 nextjs    0:00 /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run check-db
   49 nextjs    0:00 /usr/local/bin/node scripts/check-db.js
   82 nextjs    0:04 /usr/local/bin/node /app/node_modules/.bin/prisma migrate status
   89 nextjs    0:00 sh
  197 nextjs    0:00 /usr/local/bin/node /app/node_modules/prisma/build/child {"product":"prisma","version":"4.3.1","cli_install_type":"local","information":"","local_timestamp":"2022-09-14T18:30:58Z","project_
  208 nextjs    0:00 ps xaf
/app $ ps xaf
PID   USER     TIME  COMMAND
    1 nextjs    0:00 node /opt/yarn-v1.22.19/bin/yarn.js start-docker
   27 nextjs    0:00 /usr/local/bin/node /app/node_modules/.bin/run-s check-db update-tracker start-server
   89 nextjs    0:00 sh
  197 nextjs    0:00 [node]
  227 nextjs    0:00 /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run start-server
  234 nextjs    0:00 ps xaf
/app $ ps xaf

between first and second ps I see that node ...bin/prisma migrate status (from checkDb) child process number 197 is the process that become [node] zombie.

Not easy to go deep :

  • I even tried to add a proc.kill('SIGTERM') on run 'exit' event but without benefit: this zombie may be prisma detached subprocess.
    // proc.on('exit', () => resolve(buffer.join('')));
    proc.on('exit', () => { proc.kill('SIGTERM'); buffer.push("run is done"); resolve(buffer.join('')); });
  • I think it is most likely an internal prisma bug. (ex. prisma/prisma-client-js#635 or prisma/prisma#5031 prisma disconnect() issue ?)
  • umami ~~already~~ using ~~last~~ prisma client version 4.3.1 // edit: 4.4.0 quick test : seems dont fix this issue

boly38 avatar Sep 14 '22 18:09 boly38

After upgrading to 1.39.3 there are now 2 zombie processes. Did also a reboot of the host – no change.

mk3media avatar Oct 30 '22 12:10 mk3media

After upgrading to 1.39.3 there are now 2 zombie processes. Did also a reboot of the host – no change.

I have 2 zombie processes with 1.38.0

AntoninHuaut avatar Oct 30 '22 12:10 AntoninHuaut

Today on one machine one zombie process disappeared (without any restart/reboot), so there still is one. On another machine, same setup, same host os etc. there are still 2 zombie processes.

mk3media avatar Oct 31 '22 09:10 mk3media

I found a possible workaround for zombie issue according to the following context:

  • A) assume following umami version update, you've with success started umami a first time and migrate your database model
  • B) now you would like to run umami without zombie caused by migration step.

Patch umami startup sequence to ignore check-db stage

# open a shell on your umami container
docker exec -it umami sh
vi package.json
# duplicate "start-docker": line as "start-dockerBackup": (yy + p)
# update "start-docker": line by removing "check-db" (->  + dw )
# :wq
# CTRL D 
docker-compose stop
docker-compose start
# no more zombie

this is proof that the zombie is from the migration process

We could imagine an improvement where a given environnement variable could drive check-db execution or skip.

Example: UMAMI_CHECK_DB (default:true)

boly38 avatar Dec 17 '22 15:12 boly38

This issue is stale because it has been open for 60 days with no activity.

github-actions[bot] avatar Aug 19 '23 01:08 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Aug 26 '23 01:08 github-actions[bot]

I stayed a long time with 1.40 version on some site without issue/maintenance and today I just migrate to v2.9.0 : following data migration, and docker refresh & recreate I didn't see any Zombie on my road :)

A special thanks to the Umami high quality project, especially migration guide dedicated doc/repo which was just perfect 👏 🥇

boly38 avatar Jan 18 '24 20:01 boly38

I still have the zombie on v2.9.0 (two separate instances exhibiting the same behaviour), fwiw.

simonwiles avatar Jan 23 '24 18:01 simonwiles

unfortunately you're right @simonwiles

following double-check my vm, it's true that the zombie still appears (with ome delay after docker compose up -d)

boly38 avatar Jan 23 '24 19:01 boly38