[Bug]: 401 on privileged actions after cold restart despite valid login
🐞 Bug Summary
After a cold restart of the server/Kubernetes node (e.g., powered off overnight), the Admin Web UI intermittently returns 401 Unauthorized for privileged actions even though I appear logged in. Affected actions include adding MCP servers, viewing metrics, and creating servers.
🧩 Affected Component
Select the area of the project impacted:
- [x]
mcpgateway- API - [x]
mcpgateway- UI (admin panel) - [ ]
mcpgateway.wrapper- stdio wrapper - [ ] Federation or Transports
- [ ] CLI, Makefiles, or shell scripts
- [ ] Container setup (Docker/Podman/Compose)
- [ ] Other (explain below)
🔁 Steps to Reproduce
- Deploy
ghcr.io/ibm/mcp-context-forge:lateston Kubernetes with UI and Admin API enabled and auth required (env excerpt below). DB is SQLite on a PVC at/data. - Power off the host (or shut down the cluster) at end of day; power back on next day. (A cold start of the pod may also reproduce.)
- Log into the Admin UI (Basic Auth).
- Try any privileged action: Add MCP server, Metrics tab, Create server, etc.
- The UI shows “401 Unauthorized” responses for those API calls while the UI still indicates I’m logged in.
🤔 Expected Behavior
Admin actions should succeed when authenticated (200/201 responses), without requiring any extra steps after a cold restart.
📓 Logs / Error Output
Network panel shows 401 on endpoints such as /admin/servers, /admin/metrics, and related admin routes.
Pod logs primarily show 401 responses for those requests (no stacktrace).
⚠️ No secrets included. (Can provide additional sanitized logs if needed.)
🧠 Environment Info
You can retrieve most of this from the /version endpoint.
| Key | Value |
|---|---|
| Version or commit | ghcr.io/ibm/mcp-context-forge:latest (as of 2025-08-27) |
| Runtime | Containerized in Kubernetes (auth required; UI + Admin API enabled) |
| Platform / OS | Kubernetes cluster (Namespace mcp) |
| Container | Deployed via Deployment + PVC; Service is ClusterIP (HTTP to port 4444) |
🧩 Additional Context (optional)
Kubernetes manifest (relevant bits):
env:
- { name: HOST, value: "0.0.0.0" }
- { name: MCPGATEWAY_UI_ENABLED, value: "true" }
- { name: MCPGATEWAY_ADMIN_API_ENABLED, value: "true" }
- { name: AUTH_REQUIRED, value: "true" }
- name: BASIC_AUTH_USER
valueFrom: { secretKeyRef: { name: mcpgateway-secret, key: BASIC_AUTH_USER } }
- name: BASIC_AUTH_PASSWORD
valueFrom: { secretKeyRef: { name: mcpgateway-secret, key: BASIC_AUTH_PASSWORD } }
- name: JWT_SECRET_KEY
valueFrom: { secretKeyRef: { name: mcpgateway-secret, key: JWT_SECRET_KEY } }
- name: DATABASE_URL
value: "sqlite:////data/gateway/mcp.db"
Notes / hypotheses to help triage:
- If cookies are marked Secure and the UI is accessed over plain HTTP, the browser won’t send the cookie, which could present as 401s on admin routes after restart/session changes. Consider reproducing with HTTPS or, only for testing,
SECURE_COOKIES=false. - Confirm whether admin auth relies on a cookie vs. header in the UI; check
COOKIE_SAMESITEand related settings. - Verify that the JWT signing key (
JWT_SECRET_KEY) and server time are stable across restarts (clock skew can invalidate tokens).
Potential directions:
- Provide guidance on expected cookie settings for HTTP vs HTTPS deployments.
- Clarify whether the UI refreshes/rotates tokens after pod restarts, and if any cache needs to be cleared.
- Any known issues with SQLite + PVC on restart that could affect session storage would be helpful to rule in/out.