umami icon indicating copy to clipboard operation
umami copied to clipboard

v2.18 runs into file system errors in Kubernetes

Open AlexBartlAA opened this issue 8 months ago • 20 comments

Describe the Bug

First of all, thanks a lot for the new version - data-before-send looks like it is exactly what we need and saves us from implementing a custom tracker wrapper.

Unfortunately, I haven't been able to deploy the image (docker.umami.is/umami-software/umami:postgresql-v2.17.0) into our Kubernetes cluster yet. I get the file-system-permission-related error below in the migration step, more precisely when running applyMigration(). If I skip the migration step, it fails a bit later writing the manifest (log output also below).

I can't say for sure that the issue isn't on our side, but everything is working fine with v2.17.0. Unfortunately, I haven't been able to reproduce the issue locally (on Mac, if that's relevant), running the image via docker-compose works. Sorry I can't produce a minimal reproducible example.

Database

PostgreSQL

Relevant log output

> [email protected] start-docker /app
> npm-run-all check-db update-tracker set-routes-manifest start-server
> [email protected] check-db /app
> node scripts/check-db.js
✓ DATABASE_URL is defined.
✓ Database connection successful.
✓ Database version check successful.
Error: Can't write to /app/node_modules/.pnpm/@[email protected]/node_modules/@prisma/engines please make sure you install "prisma" with the right permissions.
✗ Command failed: prisma migrate deploy
Error: Can't write to /app/node_modules/.pnpm/@[email protected]/node_modules/@prisma/engines please make sure you install "prisma" with the right permissions.
 ELIFECYCLE  Command failed with exit code 1.
ERROR: "check-db" exited with 1.
 ELIFECYCLE  Command failed with exit code 1.

---

> [email protected] start-docker /app
> npm-run-all check-db update-tracker set-routes-manifest start-server
> [email protected] check-db /app
> node scripts/check-db.js
✓ DATABASE_URL is defined.
✓ Database connection successful.
✓ Database version check successful.
> [email protected] update-tracker /app
> node scripts/update-tracker.js
> [email protected] set-routes-manifest /app
> node scripts/set-routes-manifest.js
Using original Next.js routes manifest
node:fs:2426
    return binding.writeFileUtf8(
                   ^
Error: EACCES: permission denied, open '/app/.next/routes-manifest.json'
    at Object.writeFileSync (node:fs:2426:20)
    at Object.<anonymous> (/app/scripts/set-routes-manifest.js:74:4)
    at Module._compile (node:internal/modules/cjs/loader:1730:14)
    at Object..js (node:internal/modules/cjs/loader:1895:10)
    at Module.load (node:internal/modules/cjs/loader:1465:32)
    at Function._load (node:internal/modules/cjs/loader:1282:12)
    at TracingChannel.traceSync (node:diagnostics_channel:322:14)
    at wrapModuleLoad (node:internal/modules/cjs/loader:235:24)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:170:5)
    at node:internal/main/run_main_module:36:49 {
  errno: -13,
  code: 'EACCES',
  syscall: 'open',
  path: '/app/.next/routes-manifest.json'
}
Node.js v22.15.0
 ELIFECYCLE  Command failed with exit code 1.
ERROR: "set-routes-manifest" exited with 1.
 ELIFECYCLE  Command failed with exit code 1.

Which Umami version are you using? (if relevant)

v2.18.0

Which browser are you using? (if relevant)

irrelevant

How are you deploying your application? (if relevant)

Kubernetes via helm chart

AlexBartlAA avatar May 08 '25 14:05 AlexBartlAA

Same issue with docker-compose deployments.

al-lac avatar May 08 '25 15:05 al-lac

Can't reproduce the error on a new docker-compose deployment. @al-lac I'm assuming your is also an upgrade from 2.17 to 2.18?

franciscao633 avatar May 08 '25 17:05 franciscao633

@franciscao633, happens with both fresh installs and updates for me.

This is the docker-compose.yml I am using (via Umbrel): https://github.com/getumbrel/umbrel-apps/blob/35fded41f7d1b165054d0994364b06186d026bc1/umami/docker-compose.yml

al-lac avatar May 08 '25 22:05 al-lac

@al-lac Does Umbrel need to run on user 1000:1000? If you remove that line from the Umbrel compose file, it should run fine. Alternatively you need to create permissions to prisma for that user. In the DockerFile we have below

# Permissions for prisma
RUN chown -R nextjs:nodejs node_modules/.pnpm/

franciscao633 avatar May 08 '25 22:05 franciscao633

using docker (upgrading from 2.17) i am getting a failed start. had to manually revert to v2.17 and then restore the DB from snapshot bc even after i reverted the image line the container wouldnt come up until i changed the DB back to before i tried to pull the upgrade.

serversathome avatar May 08 '25 22:05 serversathome

@franciscao633 It does not need to run as user 1000:1000, but it was running fine this ways since version 2.12.1.

And indeed, the app is working after I removed the user line.

However, this seems like a step back and a bug that should be fixed.

al-lac avatar May 08 '25 22:05 al-lac

Upgrading from v2.17. Not sure if it's a related issue.

[email protected] set-routes-manifest /app
node scripts/set-routes-manifest.js
err /app/scripts/set-routes-manifest.js:21
  const routeRegex = new RegExp(apiRoute.regex);
                                         ^
err TypeError: Cannot read properties of undefined (reading 'regex')
    at Object.<anonymous> (/app/scripts/set-routes-manifest.js:21:42)
    at Module._compile (node:internal/modules/cjs/loader:1730:14)
    at Object..js (node:internal/modules/cjs/loader:1895:10)
    at Module.load (node:internal/modules/cjs/loader:1465:32)
    at Function._load (node:internal/modules/cjs/loader:1282:12)
    at TracingChannel.traceSync (node:diagnostics_channel:322:14)
    at wrapModuleLoad (node:internal/modules/cjs/loader:235:24)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:170:5)
    at node:internal/main/run_main_module:36:49
Node.js v22.15.0
 ELIFECYCLE  Command failed with exit code 1. 

I have a custom BASE_PATH, COLLECT_API_ENDPOINT and TRACKER_SCRIPT_NAME setup.

desw0lf avatar May 09 '25 08:05 desw0lf

@desw0lf Could you please create another issue, sharing more details about your setup? Seems like you're building you own image? (Issue is probably caused by BASE_PATH, I'll have a look at it). Edit: confirmed, users who build their own images with BASE_PATH will have start issues, I'll submit a fix.

People here have permissions issues, and it's most likely not new btw, high probability that their setups already failed in previous versions if runned with COLLECT_API_ENDPOINT (update-tracker would have failed too).

Maxime-J avatar May 09 '25 09:05 Maxime-J

this is my docker compose and it fails when i try to upgrade to 2.18. its def not a permissions issue:

services:
  umami:
    image: ghcr.io/umami-software/umami:postgresql-v2.17
    #image: ghcr.io/umami-software/umami:postgresql-latest
    container_name: umami
    ports:
      - 3002:3000
    environment:
      DATABASE_URL: postgresql://umami:umami@db:5432/umami
      DATABASE_TYPE: postgresql
      APP_SECRET: replace-me-with-a-random-string
    depends_on:
      db:
        condition: service_healthy
    restart: unless-stopped
    healthcheck:
      test:
        - CMD-SHELL
        - curl http://localhost:3000/api/heartbeat
      interval: 5s
      timeout: 5s
      retries: 5
  db:
    image: postgres:15-alpine
    container_name: umami-db
    environment:
      POSTGRES_DB: umami
      POSTGRES_USER: umami
      POSTGRES_PASSWORD: umami
    volumes:
      - /mnt/bigdeal/configs/umami/umami-db-data:/var/lib/postgresql/data
    restart: unless-stopped
    healthcheck:
      test:
        - CMD-SHELL
        - pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}
      interval: 5s
      timeout: 5s
      retries: 5

serversathome avatar May 09 '25 09:05 serversathome

@serversathome, this issue is about permissions errors, you might have another issue then, do you have any logs (docker compose logs)? They would be useful.

Maxime-J avatar May 09 '25 10:05 Maxime-J

@Maxime-J https://pastebin.com/E3NdtGLk

serversathome avatar May 09 '25 11:05 serversathome

@serversathome Thanks, definitely is another issue, it looks like there was a problem with 09_update_hostname_region migration, but the logs somehow doesn't show all the details. There's an open issue mentioning that migration (#3399), you can have a look at it, team will probably help you :)

@thiagoalmeidasa I had a look on your revert commit, your security context might causing your problems:

securityContext:
  [...]
  readOnlyRootFilesystem: true

Umami container can't have a read only filesystem. It was already the case with previous versions if you had COLLECT_API_ENDPOINT (for the update-tracker script).

But now, there's always a need for write operations with the pnpm switch and the routes manifest customization.

Maybe you have the same setting @AlexBartlAA?

Maxime-J avatar May 09 '25 13:05 Maxime-J

I'm also having this problem in the upgrade to 2.18

> [email protected] check-db /app
> node scripts/check-db.js

✓ DATABASE_URL is defined.
✓ Database connection successful.
✓ Database version check successful.
Error: Can't write to /app/node_modules/.pnpm/@[email protected]/node_modules/@prisma/engines please make sure you install "prisma" with the right permissions.
✗ Command failed: prisma migrate deploy
Error: Can't write to /app/node_modules/.pnpm/@[email protected]/node_modules/@prisma/engines please make sure you install "prisma" with the right permissions.

 ELIFECYCLE  Command failed with exit code 1.
ERROR: "check-db" exited with 1.
 ELIFECYCLE  Command failed with exit code 1.

Here's my docker compose config, which has been working for over a year:

  umami_postgres:
    image: postgres:15-alpine
    restart: always
    container_name: umami_postgres
    volumes:
      - ${PRIMARY_MOUNT}/postgres_15:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: umami
      POSTGRES_USER: umami
      POSTGRES_PASSWORD: umami
    networks:
      - umami-private-net
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}"]
      interval: 5s
      timeout: 5s
      retries: 5
  
  umami:
    image: ghcr.io/umami-software/umami:postgresql-latest
    restart: always
    container_name: umami
    user: 1000
    depends_on:
      umami_postgres:
        condition: service_healthy
    environment:
      DATABASE_URL: postgresql://umami:umami@umami_postgres:5432/umami
      DATABASE_TYPE: postgresql
      APP_SECRET: ${DNS_DOMAIN_ZONE_ID}
    labels:
      - traefik.enable=true
      - traefik.docker.network=umami-net
      - traefik.http.services.umami-svc.loadbalancer.server.port=3000
      - traefik.http.routers.umami-rtr.rule=Host(`umami.${DNS_DOMAIN}`)
      - traefik.http.routers.umami-rtr.entrypoints=websecure
      - traefik.http.routers.umami-rtr.tls=true
    networks:
      - umami-net
      - umami-private-net
    healthcheck:
      test: ["CMD-SHELL", "curl http://localhost:3000/api/heartbeat"]
      interval: 5s
      timeout: 5s
      retries: 5

Solution

https://github.com/umami-software/umami/commit/340cdce1dcc0504916d841d21d9585f9c8939331

Those chowns are the problem. If I remove my user line, the server starts up. Kubernetes is likely changing the execution user as well. It would be nice if you could run as whatever user you wanted, and I don't really understand why the application wants to write into node_modules/.pnpm after this update but I don't know PNPM very well so it's probably more related to PNPM than prisma.

Anyway, make sure you're running as 1001 and this should go away. I think this is a bug in the umami docker image, but that's the workaround.

subdavis avatar May 09 '25 22:05 subdavis

Thanks for your inputs! I've tested some more based on the discussion here. From what I can tell, v2.18 requires to run as root, any non-root user ID will fail. Running the container as root isn't allowed in our cluster and for good reason.

So I guess the minimal reproducible example is to use the docker-compose from the umami repo and just add user: "1000:1000" line to the umami service to run as a non-root user.

Should I rename this ticket to v2.18 docker image only runs as root, now that we know that this is a better summary of the issue?

AlexBartlAA avatar May 12 '25 06:05 AlexBartlAA

It's not exactly that, to resume:

  • Umami Dockerfile ensures that it's run as a non root user (nextjs user 1001:1001). I'm not the author but it's a standard and good practice. It works as expected if you don't override user config in your compose or kubernetes config file.

  • If a custom config was apparently working before, it wasn't completely. In the sense it would have already failed if COLLECT_API_ENDPOINT was set.

  • Umami container can't be read only. A kubernetes config with readOnlyRootFilesystem: true in securityContext will fail.

As to why write operations are needed:

-For eventual COLLECT_API_ENDPOINT and TRACKER_SCRIPT_NAME customizations, tracker script file might need to be updated, and the according Next.js manifest file is written (either the original or an updated one)

-I didn't have a deep look into it, but it looks like the engine used by the ORM isn't included anymore with the pnpm switch. So it has to be downloaded.

Engine + routes manifest writes are the new things in 2.18 which bring up permissions issues in custom configs.

Maxime-J avatar May 12 '25 08:05 Maxime-J

Ah, sorry, then I missummarized. So the dockerfile comes with a hardcoded 1001 user ID and whenever the user ID is set externally, it needs to match that. I can confirm that with 1001:1001 I don't run into the same issue.

Thanks for your support.

AlexBartlAA avatar May 12 '25 13:05 AlexBartlAA

Thanks for the insights @Maxime-J!

Also works for us now by setting the user to 1001.

al-lac avatar May 12 '25 13:05 al-lac

@subdavis , regarding your error:

Error: Can't write to /app/node_modules/.pnpm/@[email protected]/node_modules/@prisma/engines please make sure you install "prisma" with the right permissions.

I got the same and created a separate issue for it (https://github.com/umami-software/umami/issues/3422). For some Kubernetes distros, it is not possible or wanted to run as a specific user (e.g. OpenShift), so for them it is not possible to set the user to 1001, the user id will be "random". My solution to this write permission error is:

  1. Created an init container (with umami 2.18.1 image) with an empty dir at location /tmp/pnpm-node-modules/
  2. Changed args in init container to: [cp, "-r", /app/node_modules/.pnpm/., /tmp/pnpm-node-modules/]
  3. Mounted the same empty dir in the main Umami container at mount path /app/node_modules/.pnpm/

I have investigated whether it is possible to configure the path Prisma uses when downloading its engine at runtime, but this doesn't seem to be possible.

But when that was fixed, I ran into the problem mentioned in the first post here, Error: EACCES: permission denied, open '/app/.next/routes-manifest.json', and are currently stuck, still on v2.17.0.

mikaello avatar May 22 '25 10:05 mikaello

The problem occurs because the file /app/.next/routes-manifest.json is created at runtime by the script scripts/set-routes-manifest.js, which means that the user must be 1001 (which is the user that has write permission). Some environments (like OpenShift, see explanation) doesn't allow to specify user ID, and Umami is not compatible with those environments without modifications. This is the same problem with Prisma (as also mentioned in this thread), which downloads engine binaries at runtime when doing DB migration only allowed by the 1001 user.

The runtime creation of /app/.next/routes-manifest.json was implemented in https://github.com/umami-software/umami/commit/b88432fcf4c030ab582f005e4d29a80a1416509d, and was first deployed in v2.18.0.

mikaello avatar May 22 '25 11:05 mikaello

@mikaello Given your OpenShift setup, you could build you own image adding this instruction in Dockerfile before the USER line:

RUN chgrp -R 0 /app && \
    chmod -R g=u /app

(as per OpenShift docs)

It could be more fine-grained, but it would work.

OR going your way, you might as well copy the entire app folder, I guess it would work too. Edited quote:

Created an init container (with umami 2.18.1 image) with an empty dir at location /tmp/umami-app/ Changed args in init container to: [cp, "-r", /app/., /tmp/umami-app/] Mounted the same empty dir in the main Umami container at mount path /app/

Maxime-J avatar May 22 '25 15:05 Maxime-J

This issue is stale because it has been open for 60 days with no activity.

github-actions[bot] avatar Jul 22 '25 02:07 github-actions[bot]