oathkeeper
oathkeeper copied to clipboard
Upgrade Oathkeeper helm chart 0.41 causes 503
Preflight checklist
- [X] I could not find a solution in the existing issues, docs, nor discussions.
- [X] I agree to follow this project's Code of Conduct.
- [X] I have read and am following this repository's Contribution Guidelines.
- [X] I have joined the Ory Community Slack.
- [ ] I am signed up to the Ory Security Patch Newsletter.
Ory Network Project
N/A
Describe the bug
Upgrading the Oathkeeper chart to 0.41.0 causes oathkeeper to restart after 503 health check. I've not changed anything in the chart, and I don't use the secret (so it's secret.false
for me).
Reproducing the bug
Upgrade to chart 0.41.0, with secret.false
Relevant log output
No response
Relevant configuration
oathkeeper:
oathkeeper:
config:
log:
level: debug
authenticators:
noop:
enabled: true
cookie_session:
enabled: true
config:
check_session_url: http://ory-kratos-public:80/sessions/whoami
preserve_path: true
subject_from: "identity.id"
only:
- ory_kratos_session
oauth2_client_credentials:
enabled: true
config:
token_url: http://ory-hydra-public.ory.svc.cluster.local:4444/oauth2/token
cache:
enabled: true
errors:
handlers:
redirect:
enabled: true
config:
to: ***/login
json:
enabled: true
www_authenticate:
enabled: true
when:
- error:
- unauthorized
fallback:
- json
authorizers:
allow:
enabled: true
remote_json:
enabled: true
config:
remote: http://ory-keto-read.ory.svc.cluster.local:80/relation-tuples/check
payload: ""
mutators:
noop:
enabled: true
serve:
proxy:
trust_forwarded_headers: true
timeout:
write: 1m
read: 1m
idle: 1m
cors:
enabled: true
allowed_origins:
- ***
managedAccessRules: false
maester:
enabled: false
serviceMonitor:
enabled: false
deployment:
resources:
requests:
cpu: 10m
memory: 100Mi
limits:
memory: 1Gi
Version
0.41.0
On which operating system are you observing this issue?
None
In which environment are you deploying?
Kubernetes with Helm
Additional Context
No response
Hi there! Can you share/take a look at the logs of the container? We could see some more info there. The upgrade itself passed, but for tests we use a very simplistic set of configs so we could easily miss some corner case 😞
Hey :)
I'd have to re-upgrade my running cluster to get the logs, but I didn't collect or post any because there was nothing related to the 503, even with log level on debug. If I get a chance I'll find a time appropriate to break my environment to get any log output.
😅 Yeah, obviously it's better not to break you env, but a failing health check should be logged as an event on the deployment/pod object, did you maybe take a look at that? Asking because right now I can't really do more then try to reproduce that upgrade with a minimal config as close to yours as possible 😞
I've done a search through our monitoring history to see if I could get the kubernetes events around the time I attempted the upgrade, but with no luck. I'll see if I can reproduce and report back with the k8s events.
i encounter the same error after updating to 0.41. It uses the same values.yaml as before the update.
{
"audience": "application",
"error": {
"message": "The requested resource could not be found",
"stack_trace": "stack trace could not be recovered from error type *healthx.swaggerNotReadyStatus"
},
"http_request": {
"headers": {
"accept": "*/*",
"connection": "close",
"user-agent": "kube-probe/1.29"
},
"host": "127.0.0.1",
"method": "GET",
"path": "/health/ready",
"query": null,
"remote": "<cluster-ip>:57138",
"scheme": "http"
},
"http_response": {
"status_code": 503
},
"level": "error",
"msg": "An error occurred while handling a request",
"service_name": "ORY Oathkeeper",
"service_version": "v0.40.7",
"time": "2024-04-29T12:28:50.400938908Z"
}
Same error for 0.40.1. Update to 0.39.1 resolved into no error. Coming from 0.38.0
Same error here. I tried using the 0.42.0 helm chart that uses oathkeeper v0.40.7
by default. I got rid of the 503 errors by using the oathkeeper v0.39.4
with the 0.42.0 helm chart.
If this is a bug in Oathkeeper itself and not the chart or upgrade process, then maybe we should move this to the oathkeeper repo, as this could be regressions in the code itself, similar to https://github.com/ory/oathkeeper/issues/1161