kr8s icon indicating copy to clipboard operation
kr8s copied to clipboard

Kubernetes Authentication failure 61 minutes after starting.

Open cpressland opened this issue 1 year ago • 19 comments

Which project are you reporting a bug for?

kr8s

What happened?

When using kr8s as part of a FastAPI application authentication begins failing 61 minutes after starting.

Sample code:

from pathlib import Path

from fastapi import FastAPI, status
from fastapi.responses import JSONResponse, Response
from kr8s.objects import ConfigMap

app = FastAPI()

def get_current_namespace() -> str:
    """Get the current namespace."""
    return Path("/var/run/secrets/kubernetes.io/serviceaccount/namespace").read_text().strip()


configmap = ConfigMap("example", namespace=get_current_namespace())
if not configmap.exists():
    configmap.create()

@app.get("/health/readiness")
def readiness() -> Response:
    """Readiness check endpoint, returns a JSON response with status 'ok'."""
    configmap.refresh()
    if configmap.exists():
        return JSONResponse({"status": "ok"}, status_code=status.HTTP_200_OK)
    return JSONResponse({"status": "error"}, status_code=status.HTTP_503_SERVICE_UNAVAILABLE)


@app.get("/health/liveness")
def liveness() -> Response:
    """Liveness check endpoint, returns a JSON response with status 'ok'."""
    return JSONResponse({"status": "ok"}, status_code=status.HTTP_200_OK)

Logs:

First health check:

2024-12-23T11:31:13.528988544Z INFO:     Started server process [1]
2024-12-23T11:31:13.529025895Z INFO:     Waiting for application startup.
2024-12-23T11:31:13.529142835Z INFO:     Application startup complete.
2024-12-23T11:31:13.529409517Z INFO:     Uvicorn running on http://0.0.0.0:6502 (Press CTRL+C to quit)
2024-12-23T11:31:38.700639717Z INFO:     10.40.16.115:49578 - "GET /health/liveness HTTP/1.1" 200 OK
2024-12-23T11:31:38.743960452Z INFO:     10.40.16.115:49580 - "GET /health/readiness HTTP/1.1" 200 OK

Failing health check 61 minutes later:

2024-12-23T12:32:38.699077059Z INFO:     10.40.16.115:47082 - "GET /health/liveness HTTP/1.1" 200 OK
2024-12-23T12:32:38.794716051Z INFO:     10.40.16.115:47084 - "GET /health/readiness HTTP/1.1" 503 Service Unavailable

Exception when calling an endpoint that forces a configmap.refresh():

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/app/lib/python3.13/site-packages/kr8s/_api.py", line 168, in call_api
    response.raise_for_status()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/app/lib/python3.13/site-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '401 Unauthorized' for url 'https://172.16.0.1/api/v1/namespaces/sealed-secrets-tools/configmaps/cluster-certificates'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/lib/python3.13/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        self.scope, self.receive, self.send
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/app/lib/python3.13/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.13/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/app/lib/python3.13/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/app/lib/python3.13/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/app/lib/python3.13/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/app/lib/python3.13/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/app/lib/python3.13/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/app/lib/python3.13/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/app/lib/python3.13/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/app/lib/python3.13/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/app/lib/python3.13/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/app/lib/python3.13/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/app/lib/python3.13/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/app/lib/python3.13/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/app/lib/python3.13/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/app/lib/python3.13/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/app/lib/python3.13/site-packages/fastapi/routing.py", line 214, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.13/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.13/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        func, args, abandon_on_cancel=abandon_on_cancel, limiter=limiter
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/app/lib/python3.13/site-packages/anyio/_backends/_asyncio.py", line 2505, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/app/lib/python3.13/site-packages/anyio/_backends/_asyncio.py", line 1005, in run
    result = context.run(func, *args)
  File "/app/lib/python3.13/site-packages/sealed_secrets_tools/server/__init__.py", line 119, in get_certificates
    return JSONResponse({"clusters": list(get_configmap().keys())}, status_code=status.HTTP_200_OK)
                                          ~~~~~~~~~~~~~^^
  File "/app/lib/python3.13/site-packages/sealed_secrets_tools/server/__init__.py", line 33, in get_configmap
    configmap.refresh()
    ~~~~~~~~~~~~~~~~~^^
  File "/app/lib/python3.13/site-packages/kr8s/_async_utils.py", line 120, in run_sync_inner
    return portal.call(wrapped)
           ~~~~~~~~~~~^^^^^^^^^
  File "/app/lib/python3.13/site-packages/kr8s/_async_utils.py", line 92, in call
    return self._portal.call(func, *args, **kwargs)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.13/site-packages/anyio/from_thread.py", line 290, in call
    return cast(T_Retval, self.start_task_soon(func, *args).result())
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/root/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/root/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/app/lib/python3.13/site-packages/anyio/from_thread.py", line 221, in _call_func
    retval = await retval_or_awaitable
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.13/site-packages/kr8s/_objects.py", line 357, in refresh
    await self.async_refresh()
  File "/app/lib/python3.13/site-packages/kr8s/_objects.py", line 375, in async_refresh
    raise e
  File "/app/lib/python3.13/site-packages/kr8s/_objects.py", line 363, in async_refresh
    async with self.api.call_api(
               ~~~~~~~~~~~~~~~~~^
        "GET",
        ^^^^^^
    ...<2 lines>...
        namespace=self.namespace,
        ^^^^^^^^^^^^^^^^^^^^^^^^^
    ) as resp:
    ^
  File "/root/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/contextlib.py", line 214, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.13/site-packages/kr8s/_api.py", line 186, in call_api
    raise ServerError(
        error_message, status=error, response=e.response
    ) from e
kr8s._exceptions.ServerError: Unauthorized

Anything else?

No response

cpressland avatar Dec 23 '24 16:12 cpressland

How is your application authenticating with Kubernetes? Is it using a service account, or something else like OIDC? This could be related to https://github.com/kr8s-org/kr8s/issues/125

jacobtomlinson avatar Jan 06 '25 11:01 jacobtomlinson

Thanks @jacobtomlinson - authentication is entirely driven by Service Accounts, with the app only getting ConfigMaps and Secrets from the same namespace.

cpressland avatar Jan 06 '25 11:01 cpressland

That makes sense, the default service account token expiration time is an hour.

When the API gives an Unauthorized error kr8s attempts a few times to reauthenticate before giving up. There must be a bug with the reauthentication that's specific to your environment, otherwise we would be seeing this as a more widespread problem.

https://github.com/kr8s-org/kr8s/blob/5e93587669c9222324db5d589afb20fbd4817119/kr8s/_api.py#L173-L175

https://github.com/kr8s-org/kr8s/blob/5e93587669c9222324db5d589afb20fbd4817119/kr8s/_auth.py#L76-L77

https://github.com/kr8s-org/kr8s/blob/5e93587669c9222324db5d589afb20fbd4817119/kr8s/_auth.py#L264-L267

There are a few places in the logic that I could see things returning too early and not actually loading the new token.

I've just pushed a new release v0.19.0 which contains #542 to add some more debug logging. Could you update to that version, then run your code again with the logging set to DEBUG? Then share the logs. I'm particularly interested in what happens when the token expires and the auth failures start happening.

jacobtomlinson avatar Jan 06 '25 11:01 jacobtomlinson

Right, I finally had time to tackle this properly, here is a full Kubernetes Manifest set that should replicate this:

apiVersion: v1
kind: Namespace
metadata:
  name: kr8s-test
  labels:
    team: cpe
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: kr8s-test
  namespace: kr8s-test
data:
  test: "test"
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kr8s-test
  namespace: kr8s-test
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: kr8s-test
  namespace: kr8s-test
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kr8s-test
  namespace: kr8s-test
subjects:
  - kind: ServiceAccount
    name: kr8s-test
    namespace: kr8s-test
roleRef:
  kind: Role
  name: kr8s-test
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kr8s-test
  namespace: kr8s-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kr8s-test
  template:
    metadata:
      labels:
        app: kr8s-test
    spec:
      containers:
        - name: kr8s-test
          image: docker.io/python:latest
          command:
            - sh
            - -c
            - |
              pip install kr8s &&
              cat << 'EOF' > /tmp/script.py
              import logging
              logging.basicConfig(level=logging.DEBUG)
              from time import sleep
              import kr8s
              while True:
                  logging.debug(list(kr8s.get("configmap", namespace="kr8s-test")))
                  sleep(60)
              EOF
              python /tmp/script.py
          resources:
            limits:
              cpu: 100m
              memory: 128Mi
            requests:
              cpu: 100m
              memory: 128Mi
      serviceAccountName: kr8s-test

This produces the following set of debug logs every 60 seconds:

DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='172.16.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b31b0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x75b0b15cd9a0> server_hostname='172.16.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b2f80>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Audit-Id', b'942695fd-c5dd-458f-a5bc-9db8ad63b62b'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'X-Kubernetes-Pf-Flowschema-Uid', b'69fc3a70-5198-4527-8ef7-1e0a70bda0a2'), (b'X-Kubernetes-Pf-Prioritylevel-Uid', b'c2560709-9506-40dd-95cd-b32516a231e4'), (b'Date', b'Mon, 24 Feb 2025 21:10:49 GMT'), (b'Transfer-Encoding', b'chunked')])
INFO:httpx:HTTP Request: GET https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 200 OK"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:root:[<ConfigMap kr8s-test>, <ConfigMap kube-root-ca.crt>]

For the iteration where everything geos wrong, this is what we see:

DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='172.16.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b3b60>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x75b0b15cd9a0> server_hostname='172.16.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b2710>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Audit-Id', b'acec4a73-47fa-44c0-95ce-3f390c240d1e'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'Date', b'Mon, 24 Feb 2025 21:11:49 GMT'), (b'Content-Length', b'129')])
INFO:httpx:HTTP Request: GET https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:kr8s._api:Unauthorized 401 error, reauthenticating attempt 0
DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='172.16.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b3fc0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x75b0b15cfbf0> server_hostname='172.16.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b3c50>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Audit-Id', b'3b4bb6ec-af45-40df-b440-1be70ecfff13'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'Date', b'Mon, 24 Feb 2025 21:11:49 GMT'), (b'Content-Length', b'129')])
INFO:httpx:HTTP Request: GET https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:kr8s._api:Unauthorized 401 error, reauthenticating attempt 1
DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='172.16.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b1a90>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x75b0b15cf9b0> server_hostname='172.16.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b2a80>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Audit-Id', b'0d095c64-2cee-4754-9fdd-bed1b0e143a3'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'Date', b'Mon, 24 Feb 2025 21:11:49 GMT'), (b'Content-Length', b'129')])
INFO:httpx:HTTP Request: GET https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:kr8s._api:Unauthorized 401 error, reauthenticating attempt 2
DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='172.16.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b1ea0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0x75b0b15cf890> server_hostname='172.16.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x75b0b03b2210>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 401, b'Unauthorized', [(b'Audit-Id', b'83aa3515-d7d5-4e34-9088-1130e3c863a3'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'Date', b'Mon, 24 Feb 2025 21:11:49 GMT'), (b'Content-Length', b'129')])
INFO:httpx:HTTP Request: GET https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 401 Unauthorized"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/kr8s/_api.py", line 171, in call_api
    response.raise_for_status()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/site-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '401 Unauthorized' for url 'https://172.16.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/script.py", line 6, in <module>
    logging.debug(list(kr8s.get("configmap", namespace="kr8s-test")))
                  ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/kr8s/_async_utils.py", line 146, in iter_over_async
    done, obj = portal.call(get_next)
                ~~~~~~~~~~~^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/kr8s/_async_utils.py", line 91, in call
    return self._portal.call(func, *args, **kwargs)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/anyio/from_thread.py", line 290, in call
    return cast(T_Retval, self.start_task_soon(func, *args).result())
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.13/site-packages/anyio/from_thread.py", line 221, in _call_func
    retval = await retval_or_awaitable
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/kr8s/_async_utils.py", line 139, in get_next
    obj = await ait.__anext__()
          ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/kr8s/asyncio/_helpers.py", line 58, in get
    async for resource in api.async_get(
    ...<9 lines>...
        yield resource
  File "/usr/local/lib/python3.13/site-packages/kr8s/_api.py", line 466, in async_get
    async for resource in self._async_get_single(
    ...<9 lines>...
        yield resource
  File "/usr/local/lib/python3.13/site-packages/kr8s/_api.py", line 512, in _async_get_single
    async with self.async_get_kind(
               ~~~~~~~~~~~~~~~~~~~^
        kind,
        ^^^^^
    ...<6 lines>...
        **kwargs,
        ^^^^^^^^^
    ) as (obj_cls, response):
    ^
  File "/usr/local/lib/python3.13/contextlib.py", line 214, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/kr8s/_api.py", line 405, in async_get_kind
    async with self.call_api(
               ~~~~~~~~~~~~~^
        method="GET",
        ^^^^^^^^^^^^^
    ...<4 lines>...
        **kwargs,
        ^^^^^^^^^
    ) as response:
    ^
  File "/usr/local/lib/python3.13/contextlib.py", line 214, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.13/site-packages/kr8s/_api.py", line 192, in call_api
    raise ServerError(
        error_message, status=error, response=e.response
    ) from e
kr8s._exceptions.ServerError: Unauthorized

I'm not sure if those logs give you anything useful. But we're running AKS 1.31.2, I'm happy to go retest on a vanilla cluster with nothing else installed if you think it'll help?

cpressland avatar Feb 24 '25 21:02 cpressland

I tested this reproducer on a local cluster created with kind and it did not reproduce the error. I'll try again with a cloud hosted Kubernetes cluster.

jacobtomlinson avatar Feb 25 '25 12:02 jacobtomlinson

Thanks @jacobtomlinson - as above I'm more than happy to spin up a fresh cluster on Azure if it helps.

cpressland avatar Feb 25 '25 14:02 cpressland

Thanks for the offer @cpressland. If you want to do so and spend some time debugging it would be much appreciated, but equally I'd like to see this error with my own eyes to try and understand what is happening.

I can see in the logs you shared that we are attempting to reauthenticate but failing to. When we get into this state I want to figure out why the call to reauthenticate() isn't working.

https://github.com/kr8s-org/kr8s/blob/33f2cdeff021844bdc79c2f76fd11359e3efd6dc/kr8s/_api.py#L181

We should see it call into the kr8s._auth.KubeAuth.reauthenticate() method.

https://github.com/kr8s-org/kr8s/blob/33f2cdeff021844bdc79c2f76fd11359e3efd6dc/kr8s/_auth.py#L71-L83

Then it should call kr8s._auth.KubeAuth._load_service_account() and read a new token from disk. Kubernetes mounts service account tokens as a file within the container.

https://github.com/kr8s-org/kr8s/blob/33f2cdeff021844bdc79c2f76fd11359e3efd6dc/kr8s/_auth.py#L260

I want to add some more debug logging in these places so we can see what is happening.

The questions I want to answer are:

  • Do we actually get to the _load_service_account() call, or is something stopping us?
  • Is the token we read from disk different to the expired one we read during the initial authentication?

jacobtomlinson avatar Feb 25 '25 14:02 jacobtomlinson

I just ran the reproducer on fresh GKE, AKS and EKS clusters and left them running for over an hour and I'm not able to reproduce the issue. It seems like the credentials are either not expiring in the same way that you're seeing on your cluster, or the refresh is working and new tokens are being picked up.

On all three I'm seeing successful log messages way beyond the 61 minute mark.

DEBUG:kr8s._auth:Loading kubeconfig
DEBUG:httpcore.connection:close.started
DEBUG:httpcore.connection:close.complete
DEBUG:httpcore.connection:connect_tcp.started host='10.100.0.1' port=443 local_address=None timeout=None socket_options=None
DEBUG:httpcore.connection:connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0xffff8e58e0d0>
DEBUG:httpcore.connection:start_tls.started ssl_context=<ssl.SSLContext object at 0xffff8fa7a060> server_hostname='10.100.0.1' timeout=None
DEBUG:httpcore.connection:start_tls.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0xffff8e58e260>
DEBUG:httpcore.http11:send_request_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_headers.complete
DEBUG:httpcore.http11:send_request_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:send_request_body.complete
DEBUG:httpcore.http11:receive_response_headers.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Audit-Id', b'9b2a3c31-0aa1-4ae0-8527-080dec97b6e7'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'X-Kubernetes-Pf-Flowschema-Uid', b'28fb39c8-3a
INFO:httpx:HTTP Request: GET https://10.100.0.1/api/v1/namespaces/kr8s-test/configmaps?limit=100 "HTTP/1.1 200 OK"
DEBUG:httpcore.http11:receive_response_body.started request=<Request [b'GET']>
DEBUG:httpcore.http11:receive_response_body.complete
DEBUG:httpcore.http11:response_closed.started
DEBUG:httpcore.http11:response_closed.complete
DEBUG:root:[<ConfigMap kr8s-test>, <ConfigMap kube-root-ca.crt>]

Note that I used Kubernetes v1.30.8 on GKE, v1.30.9 on AKS and v1.29.13 on EKS.

jacobtomlinson avatar Feb 25 '25 17:02 jacobtomlinson

I just merged #579 which adds some more logging. @cpressland could you run your reproducer again and share the log output when it starts failing?

jacobtomlinson avatar Feb 26 '25 16:02 jacobtomlinson

right, I changed the install command to: pip install git+https://github.com/kr8s-org/kr8s.git, so this should be the main branch instead of v0.20.4: log.txt

I decided to include everything from the pod rather than snippets just in case it includes something else useful to you.

cpressland avatar Feb 27 '25 15:02 cpressland

Thanks @cpressland that really helped! I just merged a fix in #580. Do you want to give it a spin?

jacobtomlinson avatar Feb 27 '25 16:02 jacobtomlinson

Annoyingly, while this 100% solves it for my manifests above, it still happens in my WSGI/ASGI applications. Happy to continue here or raise another issue to track this. I might try and replicate this with cloudcoil also just to determine if this is a kr8s issue or something fastapi is doing. Logs attached of the most recent failure: logs.txt

cpressland avatar Mar 04 '25 14:03 cpressland

Let's reopen and continue investigating.

After the first couple of lines in the log you shared there are no more kr8s logs, could something be modifying your logging setting to suppress those?

jacobtomlinson avatar Mar 04 '25 14:03 jacobtomlinson

Yeah, I had wondered that, I think it's caused by uvicorn as the app is instantiated as follows:

import logging

import typer

logging.basicConfig(level=logging.DEBUG)

cli = typer.Typer()


@cli.command(name="server")
def cli_server() -> None:
    """Server command to start the API server."""
    import uvicorn

    uvicorn.run("sealed_secrets_tools.server:app", host="0.0.0.0", port=6502)

I'll look at patching this via a configmap overwriting the __init__.py file and report back once I've got some useful logging.

cpressland avatar Mar 04 '25 15:03 cpressland

Okay, so I got the logging working properly, we see kr8s._auth get called when the app starts:

$ kubectl logs sealed-secrets-tools-server-59686c45d5-vlnt6 -f | grep kr8s
2025-03-04 15:09:20,951 [DEBUG] kr8s._auth: Reloading credentials
2025-03-04 15:09:20,951 [DEBUG] kr8s._auth: Loading kubeconfig
2025-03-04 15:09:20,951 [DEBUG] kr8s._auth: Loading service account
2025-03-04 15:09:20,955 [DEBUG] kr8s._auth: Found new token

But then absolutely no logging from kr8s after, so I guess this isn't really a kr8s issue as uvicorn / fastapi / something else is blocking kr8s._auth from being called again. Any suggestions?

cpressland avatar Mar 04 '25 15:03 cpressland

Are you able to extend the reproducer we had before to use uvicorn/fastapi and get it reproducing the issue again?

jacobtomlinson avatar Mar 04 '25 21:03 jacobtomlinson

I absolutely can, I’ll whip something up in the next day or two. Thanks for all your patience and time on this.

cpressland avatar Mar 04 '25 22:03 cpressland

Right, good progress was made today. I built a very basic script that replicates the issue:

import uvicorn
from kr8s.objects import ConfigMap
from fastapi import FastAPI, status
from fastapi.responses import JSONResponse, Response

app = FastAPI()

configmap = ConfigMap("kr8s-test", namespace="kr8s-test")

@app.get("/health/readiness")
def readiness() -> Response:
    if configmap.exists():
        return JSONResponse({"status": "ok"}, status_code=status.HTTP_200_OK)
    return JSONResponse({"status": "error"}, status_code=status.HTTP_503_SERVICE_UNAVAILABLE)

if __name__ == "__main__":
    uvicorn.run(f"{__name__}:app", host="0.0.0.0", port=6502)

While I've not tested on kind yet, I'm fairly sure this would fail there, however I did find a solution:

The variable configmap is being defined inside the module but outside of any functions, this seems to prevent calls back to kr8s during the fastapi call. If I simply modify this code as follows, it works as expected:

@app.get("/health/readiness")
def readiness() -> Response:
    configmap = ConfigMap("kr8s-test", namespace="kr8s-test")
    if configmap.exists():
        return JSONResponse({"status": "ok"}, status_code=status.HTTP_200_OK)
    return JSONResponse({"status": "error"}, status_code=status.HTTP_503_SERVICE_UNAVAILABLE)

This is a bit of a shame as in my example, configmap is used in almost every call so defining it outside of functions was convenient - but as I have a solution, I'm happy to close this off unless you think this warrants further digging?

cpressland avatar Mar 05 '25 14:03 cpressland

Thanks for putting this together, that's really helpful.

I'm glad you have a workaround, at least you are unblocked now. But let's leave this open because I would like to get to the bottom of what is happening here.

It seems like there is some incompatibility between uvicorn, fastapi and kr8s which is resulting in the auth refreshing code not being called. But it's weird that configmap.exists() still works as expected. There must be some asyncio dispatching thing not quite working properly in this situation. I'll spend some time spelunking when I have a chance. Having a workaround takes the pressure off a little though.

jacobtomlinson avatar Mar 05 '25 16:03 jacobtomlinson