[Bug]: 401 on privileged actions after cold restart despite valid login

Open InigoGastesi opened this issue 3 months ago • 1 comments

🐞 Bug Summary

After a cold restart of the server/Kubernetes node (e.g., powered off overnight), the Admin Web UI intermittently returns 401 Unauthorized for privileged actions even though I appear logged in. Affected actions include adding MCP servers, viewing metrics, and creating servers.

🧩 Affected Component

Select the area of the project impacted:

[x] mcpgateway - API
[x] mcpgateway - UI (admin panel)
[ ] mcpgateway.wrapper - stdio wrapper
[ ] Federation or Transports
[ ] CLI, Makefiles, or shell scripts
[ ] Container setup (Docker/Podman/Compose)
[ ] Other (explain below)

🔁 Steps to Reproduce

Deploy ghcr.io/ibm/mcp-context-forge:latest on Kubernetes with UI and Admin API enabled and auth required (env excerpt below). DB is SQLite on a PVC at /data.
Power off the host (or shut down the cluster) at end of day; power back on next day. (A cold start of the pod may also reproduce.)
Log into the Admin UI (Basic Auth).
Try any privileged action: Add MCP server, Metrics tab, Create server, etc.
The UI shows “401 Unauthorized” responses for those API calls while the UI still indicates I’m logged in.

🤔 Expected Behavior

Admin actions should succeed when authenticated (200/201 responses), without requiring any extra steps after a cold restart.

📓 Logs / Error Output

Network panel shows 401 on endpoints such as /admin/servers, /admin/metrics, and related admin routes.
Pod logs primarily show 401 responses for those requests (no stacktrace).
⚠️ No secrets included. (Can provide additional sanitized logs if needed.)

🧠 Environment Info

You can retrieve most of this from the /version endpoint.

Key	Value
Version or commit	`ghcr.io/ibm/mcp-context-forge:latest` (as of 2025-08-27)
Runtime	Containerized in Kubernetes (auth required; UI + Admin API enabled)
Platform / OS	Kubernetes cluster (Namespace `mcp`)
Container	Deployed via Deployment + PVC; Service is ClusterIP (HTTP to port 4444)

🧩 Additional Context (optional)

Kubernetes manifest (relevant bits):

env:
  - { name: HOST, value: "0.0.0.0" }
  - { name: MCPGATEWAY_UI_ENABLED, value: "true" }
  - { name: MCPGATEWAY_ADMIN_API_ENABLED, value: "true" }
  - { name: AUTH_REQUIRED, value: "true" }
  - name: BASIC_AUTH_USER
    valueFrom: { secretKeyRef: { name: mcpgateway-secret, key: BASIC_AUTH_USER } }
  - name: BASIC_AUTH_PASSWORD
    valueFrom: { secretKeyRef: { name: mcpgateway-secret, key: BASIC_AUTH_PASSWORD } }
  - name: JWT_SECRET_KEY
    valueFrom: { secretKeyRef: { name: mcpgateway-secret, key: JWT_SECRET_KEY } }
  - name: DATABASE_URL
    value: "sqlite:////data/gateway/mcp.db"

Notes / hypotheses to help triage:

If cookies are marked Secure and the UI is accessed over plain HTTP, the browser won’t send the cookie, which could present as 401s on admin routes after restart/session changes. Consider reproducing with HTTPS or, only for testing, SECURE_COOKIES=false.
Confirm whether admin auth relies on a cookie vs. header in the UI; check COOKIE_SAMESITE and related settings.
Verify that the JWT signing key (JWT_SECRET_KEY) and server time are stable across restarts (clock skew can invalidate tokens).

Potential directions:

Provide guidance on expected cookie settings for HTTP vs HTTPS deployments.
Clarify whether the UI refreshes/rotates tokens after pod restarts, and if any cache needs to be cleared.
Any known issues with SQLite + PVC on restart that could affect session storage would be helpful to rule in/out.

Aug 27 '25 07:08 InigoGastesi