Kubernetes Authentication failure 61 minutes after starting.
Which project are you reporting a bug for?
kr8s
What happened?
When using kr8s as part of a FastAPI application authentication begins failing 61 minutes after starting.
Sample code:
from pathlib import Path
from fastapi import FastAPI, status
from fastapi.responses import JSONResponse, Response
from kr8s.objects import ConfigMap
app = FastAPI()
def get_current_namespace() -> str:
"""Get the current namespace."""
return Path("/var/run/secrets/kubernetes.io/serviceaccount/namespace").read_text().strip()
configmap = ConfigMap("example", namespace=get_current_namespace())
if not configmap.exists():
configmap.create()
@app.get("/health/readiness")
def readiness() -> Response:
"""Readiness check endpoint, returns a JSON response with status 'ok'."""
configmap.refresh()
if configmap.exists():
return JSONResponse({"status": "ok"}, status_code=status.HTTP_200_OK)
return JSONResponse({"status": "error"}, status_code=status.HTTP_503_SERVICE_UNAVAILABLE)
@app.get("/health/liveness")
def liveness() -> Response:
"""Liveness check endpoint, returns a JSON response with status 'ok'."""
return JSONResponse({"status": "ok"}, status_code=status.HTTP_200_OK)
Logs:
First health check:
2024-12-23T11:31:13.528988544Z INFO: Started server process [1]
2024-12-23T11:31:13.529025895Z INFO: Waiting for application startup.
2024-12-23T11:31:13.529142835Z INFO: Application startup complete.
2024-12-23T11:31:13.529409517Z INFO: Uvicorn running on http://0.0.0.0:6502 (Press CTRL+C to quit)
2024-12-23T11:31:38.700639717Z INFO: 10.40.16.115:49578 - "GET /health/liveness HTTP/1.1" 200 OK
2024-12-23T11:31:38.743960452Z INFO: 10.40.16.115:49580 - "GET /health/readiness HTTP/1.1" 200 OK
Failing health check 61 minutes later:
2024-12-23T12:32:38.699077059Z INFO: 10.40.16.115:47082 - "GET /health/liveness HTTP/1.1" 200 OK
2024-12-23T12:32:38.794716051Z INFO: 10.40.16.115:47084 - "GET /health/readiness HTTP/1.1" 503 Service Unavailable
Exception when calling an endpoint that forces a configmap.refresh():
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/app/lib/python3.13/site-packages/kr8s/_api.py", line 168, in call_api
response.raise_for_status()
~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/app/lib/python3.13/site-packages/httpx/_models.py", line 829, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '401 Unauthorized' for url 'https://172.16.0.1/api/v1/namespaces/sealed-secrets-tools/configmaps/cluster-certificates'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/lib/python3.13/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
self.scope, self.receive, self.send
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/app/lib/python3.13/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/app/lib/python3.13/site-packages/starlette/applications.py", line 113, in __call__
await self.middleware_stack(scope, receive, send)
File "/app/lib/python3.13/site-packages/starlette/middleware/errors.py", line 187, in __call__
raise exc
File "/app/lib/python3.13/site-packages/starlette/middleware/errors.py", line 165, in __call__
await self.app(scope, receive, _send)
File "/app/lib/python3.13/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/app/lib/python3.13/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/app/lib/python3.13/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/app/lib/python3.13/site-packages/starlette/routing.py", line 715, in __call__
await self.middleware_stack(scope, receive, send)
File "/app/lib/python3.13/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/app/lib/python3.13/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/app/lib/python3.13/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/app/lib/python3.13/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/app/lib/python3.13/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/app/lib/python3.13/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<3 lines>...
)
^
File "/app/lib/python3.13/site-packages/fastapi/routing.py", line 214, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
func, args, abandon_on_cancel=abandon_on_cancel, limiter=limiter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/app/lib/python3.13/site-packages/anyio/_backends/_asyncio.py", line 2505, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/anyio/_backends/_asyncio.py", line 1005, in run
result = context.run(func, *args)
File "/app/lib/python3.13/site-packages/sealed_secrets_tools/server/__init__.py", line 119, in get_certificates
return JSONResponse({"clusters": list(get_configmap().keys())}, status_code=status.HTTP_200_OK)
~~~~~~~~~~~~~^^
File "/app/lib/python3.13/site-packages/sealed_secrets_tools/server/__init__.py", line 33, in get_configmap
configmap.refresh()
~~~~~~~~~~~~~~~~~^^
File "/app/lib/python3.13/site-packages/kr8s/_async_utils.py", line 120, in run_sync_inner
return portal.call(wrapped)
~~~~~~~~~~~^^^^^^^^^
File "/app/lib/python3.13/site-packages/kr8s/_async_utils.py", line 92, in call
return self._portal.call(func, *args, **kwargs)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/anyio/from_thread.py", line 290, in call
return cast(T_Retval, self.start_task_soon(func, *args).result())
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/root/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "/root/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/app/lib/python3.13/site-packages/anyio/from_thread.py", line 221, in _call_func
retval = await retval_or_awaitable
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/kr8s/_objects.py", line 357, in refresh
await self.async_refresh()
File "/app/lib/python3.13/site-packages/kr8s/_objects.py", line 375, in async_refresh
raise e
File "/app/lib/python3.13/site-packages/kr8s/_objects.py", line 363, in async_refresh
async with self.api.call_api(
~~~~~~~~~~~~~~~~~^
"GET",
^^^^^^
...<2 lines>...
namespace=self.namespace,
^^^^^^^^^^^^^^^^^^^^^^^^^
) as resp:
^
File "/root/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/contextlib.py", line 214, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.13/site-packages/kr8s/_api.py", line 186, in call_api
raise ServerError(
error_message, status=error, response=e.response
) from e
kr8s._exceptions.ServerError: Unauthorized
Anything else?
No response
How is your application authenticating with Kubernetes? Is it using a service account, or something else like OIDC? This could be related to https://github.com/kr8s-org/kr8s/issues/125
Thanks @jacobtomlinson - authentication is entirely driven by Service Accounts, with the app only getting ConfigMaps and Secrets from the same namespace.
That makes sense, the default service account token expiration time is an hour.
When the API gives an Unauthorized error kr8s attempts a few times to reauthenticate before giving up. There must be a bug with the reauthentication that's specific to your environment, otherwise we would be seeing this as a more widespread problem.
https://github.com/kr8s-org/kr8s/blob/5e93587669c9222324db5d589afb20fbd4817119/kr8s/_api.py#L173-L175
https://github.com/kr8s-org/kr8s/blob/5e93587669c9222324db5d589afb20fbd4817119/kr8s/_auth.py#L76-L77
https://github.com/kr8s-org/kr8s/blob/5e93587669c9222324db5d589afb20fbd4817119/kr8s/_auth.py#L264-L267
There are a few places in the logic that I could see things returning too early and not actually loading the new token.
I've just pushed a new release v0.19.0 which contains #542 to add some more debug logging. Could you update to that version, then run your code again with the logging set to DEBUG? Then share the logs. I'm particularly interested in what happens when the token expires and the auth failures start happening.
Right, I finally had time to tackle this properly, here is a full Kubernetes Manifest set that should replicate this:
apiVersion: v1
kind: Namespace
metadata:
name: kr8s-test
labels:
team: cpe
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kr8s-test
namespace: kr8s-test
data:
test: "test"
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kr8s-test
namespace: kr8s-test
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kr8s-test
namespace: kr8s-test
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kr8s-test
namespace: kr8s-test
subjects:
- kind: ServiceAccount
name: kr8s-test
namespace: kr8s-test
roleRef:
kind: Role
name: kr8s-test
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kr8s-test
namespace: kr8s-test
spec:
replicas: 1
selector:
matchLabels:
app: kr8s-test
template:
metadata:
labels:
app: kr8s-test
spec:
containers:
- name: kr8s-test
image: docker.io/python:latest
command:
- sh
- -c
- |
pip install kr8s &&
cat << 'EOF' > /tmp/script.py
import logging
logging.basicConfig(level=logging.DEBUG)
from time import sleep
import kr8s
while True:
logging.debug(list(kr8s.get("configmap", namespace="kr8s-test")))
sleep(60)
EOF
python /tmp/script.py
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
serviceAccountName: kr8s-test
This produces the following set of debug logs every 60 seconds:
DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='172.16.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b31b0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x75b0b15cd9a0> server_hostname='172.16.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b2f80>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Audit-Id', b'942695fd-c5dd-458f-a5bc-9db8ad63b62b'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'X-Kubernetes-Pf-Flowschema-Uid', b'69fc3a70-5198-4527-8ef7-1e0a70bda0a2'), (b'X-Kubernetes-Pf-Prioritylevel-Uid', b'c2560709-9506-40dd-95cd-b32516a231e4'), (b'Date', b'Mon, 24 Feb 2025 21:10:49 GMT'), (b'Transfer-Encoding', b'chunked')])
INFO:httpx:HTTP Request: GET https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 200 OK"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:root:[<ConfigMap kr8s-test>, <ConfigMap kube-root-ca.crt>]
For the iteration where everything geos wrong, this is what we see:
DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='172.16.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b3b60>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x75b0b15cd9a0> server_hostname='172.16.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b2710>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Audit-Id', b'acec4a73-47fa-44c0-95ce-3f390c240d1e'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'Date', b'Mon, 24 Feb 2025 21:11:49 GMT'), (b'Content-Length', b'129')])
INFO:httpx:HTTP Request: GET https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:kr8s._api:Unauthorized 401 error, reauthenticating attempt 0
DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='172.16.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b3fc0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x75b0b15cfbf0> server_hostname='172.16.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b3c50>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Audit-Id', b'3b4bb6ec-af45-40df-b440-1be70ecfff13'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'Date', b'Mon, 24 Feb 2025 21:11:49 GMT'), (b'Content-Length', b'129')])
INFO:httpx:HTTP Request: GET https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:kr8s._api:Unauthorized 401 error, reauthenticating attempt 1
DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='172.16.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b1a90>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x75b0b15cf9b0> server_hostname='172.16.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b2a80>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Audit-Id', b'0d095c64-2cee-4754-9fdd-bed1b0e143a3'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'Date', b'Mon, 24 Feb 2025 21:11:49 GMT'), (b'Content-Length', b'129')])
INFO:httpx:HTTP Request: GET https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:kr8s._api:Unauthorized 401 error, reauthenticating attempt 2
DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='172.16.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b1ea0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x75b0b15cf890> server_hostname='172.16.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b2210>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Audit-Id', b'83aa3515-d7d5-4e34-9088-1130e3c863a3'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'Date', b'Mon, 24 Feb 2025 21:11:49 GMT'), (b'Content-Length', b'129')])
INFO:httpx:HTTP Request: GET https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
Traceback (most recent call last):
File "/usr/local/lib/python3.13/site-packages/kr8s/_api.py", line 171, in call_api
response.raise_for_status()
~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/usr/local/lib/python3.13/site-packages/httpx/_models.py", line 829, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '401 Unauthorized' for url 'https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/script.py", line 6, in <module>
logging.debug(list(kr8s.get("configmap", namespace="kr8s-test")))
~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/kr8s/_async_utils.py", line 146, in iter_over_async
done, obj = portal.call(get_next)
~~~~~~~~~~~^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/kr8s/_async_utils.py", line 91, in call
return self._portal.call(func, *args, **kwargs)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/anyio/from_thread.py", line 290, in call
return cast(T_Retval, self.start_task_soon(func, *args).result())
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/usr/local/lib/python3.13/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "/usr/local/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/local/lib/python3.13/site-packages/anyio/from_thread.py", line 221, in _call_func
retval = await retval_or_awaitable
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/kr8s/_async_utils.py", line 139, in get_next
obj = await ait.__anext__()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/kr8s/asyncio/_helpers.py", line 58, in get
async for resource in api.async_get(
...<9 lines>...
yield resource
File "/usr/local/lib/python3.13/site-packages/kr8s/_api.py", line 466, in async_get
async for resource in self._async_get_single(
...<9 lines>...
yield resource
File "/usr/local/lib/python3.13/site-packages/kr8s/_api.py", line 512, in _async_get_single
async with self.async_get_kind(
~~~~~~~~~~~~~~~~~~~^
kind,
^^^^^
...<6 lines>...
**kwargs,
^^^^^^^^^
) as (obj_cls, response):
^
File "/usr/local/lib/python3.13/contextlib.py", line 214, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/kr8s/_api.py", line 405, in async_get_kind
async with self.call_api(
~~~~~~~~~~~~~^
method="GET",
^^^^^^^^^^^^^
...<4 lines>...
**kwargs,
^^^^^^^^^
) as response:
^
File "/usr/local/lib/python3.13/contextlib.py", line 214, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/kr8s/_api.py", line 192, in call_api
raise ServerError(
error_message, status=error, response=e.response
) from e
kr8s._exceptions.ServerError: Unauthorized
I'm not sure if those logs give you anything useful. But we're running AKS 1.31.2, I'm happy to go retest on a vanilla cluster with nothing else installed if you think it'll help?
I tested this reproducer on a local cluster created with kind and it did not reproduce the error. I'll try again with a cloud hosted Kubernetes cluster.
Thanks @jacobtomlinson - as above I'm more than happy to spin up a fresh cluster on Azure if it helps.
Thanks for the offer @cpressland. If you want to do so and spend some time debugging it would be much appreciated, but equally I'd like to see this error with my own eyes to try and understand what is happening.
I can see in the logs you shared that we are attempting to reauthenticate but failing to. When we get into this state I want to figure out why the call to reauthenticate() isn't working.
https://github.com/kr8s-org/kr8s/blob/33f2cdeff021844bdc79c2f76fd11359e3efd6dc/kr8s/_api.py#L181
We should see it call into the kr8s._auth.KubeAuth.reauthenticate() method.
https://github.com/kr8s-org/kr8s/blob/33f2cdeff021844bdc79c2f76fd11359e3efd6dc/kr8s/_auth.py#L71-L83
Then it should call kr8s._auth.KubeAuth._load_service_account() and read a new token from disk. Kubernetes mounts service account tokens as a file within the container.
https://github.com/kr8s-org/kr8s/blob/33f2cdeff021844bdc79c2f76fd11359e3efd6dc/kr8s/_auth.py#L260
I want to add some more debug logging in these places so we can see what is happening.
The questions I want to answer are:
- Do we actually get to the
_load_service_account()call, or is something stopping us? - Is the token we read from disk different to the expired one we read during the initial authentication?
I just ran the reproducer on fresh GKE, AKS and EKS clusters and left them running for over an hour and I'm not able to reproduce the issue. It seems like the credentials are either not expiring in the same way that you're seeing on your cluster, or the refresh is working and new tokens are being picked up.
On all three I'm seeing successful log messages way beyond the 61 minute mark.
DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='10.100.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0xffff8e58e0d0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0xffff8fa7a060> server_hostname='10.100.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0xffff8e58e260>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Audit-Id', b'9b2a3c31-0aa1-4ae0-8527-080dec97b6e7'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'X-Kubernetes-Pf-Flowschema-Uid', b'28fb39c8-3a
INFO:httpx:HTTP Request: GET https://10.100.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 200 OK"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:root:[<ConfigMap kr8s-test>, <ConfigMap kube-root-ca.crt>]
Note that I used Kubernetes v1.30.8 on GKE, v1.30.9 on AKS and v1.29.13 on EKS.
I just merged #579 which adds some more logging. @cpressland could you run your reproducer again and share the log output when it starts failing?
right, I changed the install command to: pip install git+https://github.com/kr8s-org/kr8s.git, so this should be the main branch instead of v0.20.4: log.txt
I decided to include everything from the pod rather than snippets just in case it includes something else useful to you.
Thanks @cpressland that really helped! I just merged a fix in #580. Do you want to give it a spin?
Annoyingly, while this 100% solves it for my manifests above, it still happens in my WSGI/ASGI applications. Happy to continue here or raise another issue to track this. I might try and replicate this with cloudcoil also just to determine if this is a kr8s issue or something fastapi is doing. Logs attached of the most recent failure: logs.txt
Let's reopen and continue investigating.
After the first couple of lines in the log you shared there are no more kr8s logs, could something be modifying your logging setting to suppress those?
Yeah, I had wondered that, I think it's caused by uvicorn as the app is instantiated as follows:
import logging
import typer
logging.basicConfig(level=logging.DEBUG)
cli = typer.Typer()
@cli.command(name="server")
def cli_server() -> None:
"""Server command to start the API server."""
import uvicorn
uvicorn.run("sealed_secrets_tools.server:app", host="0.0.0.0", port=6502)
I'll look at patching this via a configmap overwriting the __init__.py file and report back once I've got some useful logging.
Okay, so I got the logging working properly, we see kr8s._auth get called when the app starts:
$ kubectl logs sealed-secrets-tools-server-59686c45d5-vlnt6 -f | grep kr8s
2025-03-04 15:09:20,951 [DEBUG] kr8s._auth: Reloading credentials
2025-03-04 15:09:20,951 [DEBUG] kr8s._auth: Loading kubeconfig
2025-03-04 15:09:20,951 [DEBUG] kr8s._auth: Loading service account
2025-03-04 15:09:20,955 [DEBUG] kr8s._auth: Found new token
But then absolutely no logging from kr8s after, so I guess this isn't really a kr8s issue as uvicorn / fastapi / something else is blocking kr8s._auth from being called again. Any suggestions?
Are you able to extend the reproducer we had before to use uvicorn/fastapi and get it reproducing the issue again?
I absolutely can, I’ll whip something up in the next day or two. Thanks for all your patience and time on this.
Right, good progress was made today. I built a very basic script that replicates the issue:
import uvicorn
from kr8s.objects import ConfigMap
from fastapi import FastAPI, status
from fastapi.responses import JSONResponse, Response
app = FastAPI()
configmap = ConfigMap("kr8s-test", namespace="kr8s-test")
@app.get("/health/readiness")
def readiness() -> Response:
if configmap.exists():
return JSONResponse({"status": "ok"}, status_code=status.HTTP_200_OK)
return JSONResponse({"status": "error"}, status_code=status.HTTP_503_SERVICE_UNAVAILABLE)
if __name__ == "__main__":
uvicorn.run(f"{__name__}:app", host="0.0.0.0", port=6502)
While I've not tested on kind yet, I'm fairly sure this would fail there, however I did find a solution:
The variable configmap is being defined inside the module but outside of any functions, this seems to prevent calls back to kr8s during the fastapi call. If I simply modify this code as follows, it works as expected:
@app.get("/health/readiness")
def readiness() -> Response:
configmap = ConfigMap("kr8s-test", namespace="kr8s-test")
if configmap.exists():
return JSONResponse({"status": "ok"}, status_code=status.HTTP_200_OK)
return JSONResponse({"status": "error"}, status_code=status.HTTP_503_SERVICE_UNAVAILABLE)
This is a bit of a shame as in my example, configmap is used in almost every call so defining it outside of functions was convenient - but as I have a solution, I'm happy to close this off unless you think this warrants further digging?
Thanks for putting this together, that's really helpful.
I'm glad you have a workaround, at least you are unblocked now. But let's leave this open because I would like to get to the bottom of what is happening here.
It seems like there is some incompatibility between uvicorn, fastapi and kr8s which is resulting in the auth refreshing code not being called. But it's weird that configmap.exists() still works as expected. There must be some asyncio dispatching thing not quite working properly in this situation. I'll spend some time spelunking when I have a chance. Having a workaround takes the pressure off a little though.