kopf icon indicating copy to clipboard operation
kopf copied to clipboard

Monitor watchers with Liveness Probe

Open dheeg opened this issue 1 year ago • 0 comments

Keywords

No response

Problem

From time to time while starting the Kopf operator, one or more CRD watchers fail. In all known cases, the error was related to Kubernetes API errors.

It seems to be an critical startup moment - it will not recover automatically and only a restart of the operator helps.

Is there a way to monitor the status of all expected watchers via @kopf.on.probe()? Alternatively, is there a way to crash Kopf if this happens?

Finalizers will hang once it happened - a scenario I would like to resolve automatically.

Thanks for any sort of help.

One example

Final Exception:

  File "/usr/local/lib/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 150, in check_response
    raise cls(payload, status=response.status) from e
kopf._cogs.clients.errors.APIForbiddenError: ('thing.example.com is forbidden: User "system:serviceaccount:operator:serviceaccount" cannot watch resource "thing" in API group "example.com" at the cluster scope', {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'things.example.com is forbidden: User "system:serviceaccount:operator:serviceaccount" cannot watch resource "things" in API group "example.com" at the cluster scope', 'reason': 'Forbidden', 'details': {'group': 'example.com', 'kind': 'things'}, 'code': 403})

dheeg avatar Nov 17 '23 12:11 dheeg