Zombie process in/from Docker container
I just noticed a zombie process on my server. After some further investigation I found the course of the problem, the zombie process belongs to the umami docker container. Here is the output of top on the containers console:
Load average: 0.10 0.03 0.01 2/519 282 PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 237 226 nextjs S 20.3g1080% 1 0% /usr/local/bin/node serve 226 28 nextjs S 306m 16% 1 0% /usr/local/bin/node /opt/ 1 0 nextjs S 305m 16% 0 0% node /opt/yarn-v1.22.19/b 28 1 nextjs S 284m 15% 0 0% /usr/local/bin/node /app/ 276 0 nextjs S 1680 0% 0 0% sh 282 276 nextjs R 1608 0% 0 0% top 197 1 nextjs Z 0 0% 1 0% [node]
Same problem is present on two different systems. Any suggestions?
I got same issue / after upgrading umami version v1.36.1 > v1.37.0
here is an example:
9370 ? Sl 0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 6b2d9f052****fefccf7 -address /run/containerd/containerd.sock
9398 ? Ssl 0:00 \_ node /opt/yarn-v1.22.19/bin/yarn.js start-docker
9468 ? Sl 0:00 \_ /usr/local/bin/node /app/node_modules/.bin/npm-run-all check-db update-tracker start-server
9828 ? Sl 0:00 | \_ /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run start-server
9839 ? Sl 0:03 | \_ /usr/local/bin/node server.js
9668 ? Zs 0:00 \_ [node] <defunct>
9791 ? Zs 0:00 \_ [node] <defunct>
btw container logs seems ok:
docker logs my_umami
yarn run v1.22.19
$ npm-run-all check-db update-tracker start-server
$ node scripts/check-db.js
✓ DATABASE_URL is defined.
✓ Database connection successful.
✓ Database tables found.
Prisma schema loaded from prisma/schema.prisma
Datasource "db": PostgreSQL database "postgres", schema "public" at "my_pgsql:5432"
2 migrations found in prisma/migrations
Following migration have not yet been applied:
02_add_event_data
To apply migrations in development run yarn prisma migrate dev.
To apply migrations in production run yarn prisma migrate deploy.
Running update...
Prisma schema loaded from prisma/schema.prisma
Datasource "db": PostgreSQL database "postgres", schema "public" at "my_pgsql:5432"
2 migrations found in prisma/migrations
Applying migration `02_add_event_data`
The following migration have been applied:
migrations/
└─ 02_add_event_data/
└─ migration.sql
All migrations have been successfully applied.
✓ Database is up to date.
$ node scripts/update-tracker.js
$ node server.js
Listening on port 3000
(...some table json definition..)
- reboot the VM
- 1 zombi :'(
1347 ? Sl 0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 6b2d9f052****feed7550555441b2acb7fefccf7 -address /run/containerd/containerd.sock
1421 ? Ssl 0:00 \_ node /opt/yarn-v1.22.19/bin/yarn.js start-docker
1899 ? Sl 0:00 \_ /usr/local/bin/node /app/node_modules/.bin/npm-run-all check-db update-tracker start-server
2187 ? Sl 0:00 | \_ /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run start-server
2198 ? Sl 0:03 | \_ /usr/local/bin/node server.js
2158 ? Zs 0:00 \_ [node] <defunct>
FYI: about image content
$ docker exec -it my_umami sh
/app $ npm list -g --depth 0
npm WARN config global `--global`, `--local` are deprecated. Use `--location=global` instead.
/usr/local/lib
+-- [email protected]
`-- [email protected]
/app $ yarn --version
1.22.19
could you tell us a way to troubleshoot? or help you to reproduce ?
Updated to 1.38.0 and the zombie process still remains :(
zombie still there in 1.38 too.
I tried to change for package.json :: start-docker target the binary npm-run-all(doc) by run-s to run node commands sequentially and try to identify the root cause.
Then
- restart the umami docker
- then quickly
docker exec -it myumami sh(having shell on umami docker container) - then repeat
ps xafcommand
What I see is :
/app $ ps xaf
PID USER TIME COMMAND
1 nextjs 0:00 node /opt/yarn-v1.22.19/bin/yarn.js start-docker
27 nextjs 0:00 /usr/local/bin/node /app/node_modules/.bin/run-s check-db update-tracker start-server
38 nextjs 0:00 /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run check-db
49 nextjs 0:00 /usr/local/bin/node scripts/check-db.js
82 nextjs 0:02 /usr/local/bin/node /app/node_modules/.bin/prisma migrate status
89 nextjs 0:00 sh
105 nextjs 0:00 [sh]
106 nextjs 0:00 ps xaf
/app $ ps xaf
PID USER TIME COMMAND
1 nextjs 0:00 node /opt/yarn-v1.22.19/bin/yarn.js start-docker
27 nextjs 0:00 /usr/local/bin/node /app/node_modules/.bin/run-s check-db update-tracker start-server
38 nextjs 0:00 /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run check-db
49 nextjs 0:00 /usr/local/bin/node scripts/check-db.js
82 nextjs 0:04 /usr/local/bin/node /app/node_modules/.bin/prisma migrate status
89 nextjs 0:00 sh
197 nextjs 0:00 /usr/local/bin/node /app/node_modules/prisma/build/child {"product":"prisma","version":"4.3.1","cli_install_type":"local","information":"","local_timestamp":"2022-09-14T18:30:58Z","project_
208 nextjs 0:00 ps xaf
/app $ ps xaf
PID USER TIME COMMAND
1 nextjs 0:00 node /opt/yarn-v1.22.19/bin/yarn.js start-docker
27 nextjs 0:00 /usr/local/bin/node /app/node_modules/.bin/run-s check-db update-tracker start-server
89 nextjs 0:00 sh
197 nextjs 0:00 [node]
227 nextjs 0:00 /usr/local/bin/node /opt/yarn-v1.22.19/bin/yarn.js run start-server
234 nextjs 0:00 ps xaf
/app $ ps xaf
between first and second ps I see that node ...bin/prisma migrate status (from checkDb) child process number 197 is the process that become [node] zombie.
Not easy to go deep :
- I even tried to add a
proc.kill('SIGTERM')onrun'exit' event but without benefit: this zombie may be prisma detached subprocess.
// proc.on('exit', () => resolve(buffer.join('')));
proc.on('exit', () => { proc.kill('SIGTERM'); buffer.push("run is done"); resolve(buffer.join('')); });
- I think it is most likely an internal prisma bug. (ex. prisma/prisma-client-js#635 or prisma/prisma#5031 prisma disconnect() issue ?)
- umami ~~already~~ using ~~last~~ prisma client version
4.3.1// edit:4.4.0quick test : seems dont fix this issue
After upgrading to 1.39.3 there are now 2 zombie processes. Did also a reboot of the host – no change.
After upgrading to 1.39.3 there are now 2 zombie processes. Did also a reboot of the host – no change.
I have 2 zombie processes with 1.38.0
Today on one machine one zombie process disappeared (without any restart/reboot), so there still is one. On another machine, same setup, same host os etc. there are still 2 zombie processes.
I found a possible workaround for zombie issue according to the following context:
- A) assume following umami version update, you've with success started umami a first time and migrate your database model
- B) now you would like to run umami without zombie caused by migration step.
Patch umami startup sequence to ignore check-db stage
# open a shell on your umami container
docker exec -it umami sh
vi package.json
# duplicate "start-docker": line as "start-dockerBackup": (yy + p)
# update "start-docker": line by removing "check-db" (-> + dw )
# :wq
# CTRL D
docker-compose stop
docker-compose start
# no more zombie
this is proof that the zombie is from the migration process
We could imagine an improvement where a given environnement variable could drive check-db execution or skip.
Example: UMAMI_CHECK_DB (default:true)
This issue is stale because it has been open for 60 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.
I stayed a long time with 1.40 version on some site without issue/maintenance
and today I just migrate to v2.9.0 : following data migration, and docker refresh & recreate I didn't see any Zombie on my road :)
A special thanks to the Umami high quality project, especially migration guide dedicated doc/repo which was just perfect 👏 🥇
I still have the zombie on v2.9.0 (two separate instances exhibiting the same behaviour), fwiw.
unfortunately you're right @simonwiles
following double-check my vm, it's true that the zombie still appears (with ome delay after docker compose up -d)