kopf icon indicating copy to clipboard operation
kopf copied to clipboard

Timer stops if an error occur updating the object status

Open vierno opened this issue 4 years ago • 2 comments

Long story short

When using @kopf.timer and the status patching mechanism, the timer will stop if an exception is raised when the object is being patched (for example connectivity issues with the k8s API). The only way I found to restart the timers in this situation is to restart the container.

Description

To reproduce the issue, it's provided a snippet of a simple timer that patches the object status and you can simulate a connectivity issue with the k8s server and the logs will show that the handler will not be invoked again.

The code snippet to reproduce the issue
import kopf

@kopf.timer("globoi.com", "v1alpha1", "timertests", interval=3)
async def testing(spec, status, retry, namespace, name, **kwargs):
    return {"sleep": True}
The exact command to reproduce the issue
kopf run controller.py --verbose
The full output of the command that failed
[2021-01-13 18:54:25,475] kopf.objects         [DEBUG   ] [kernel/test] Timer 'testing' is invoked.
[2021-01-13 18:54:25,475] kopf.objects         [INFO    ] [kernel/test] Timer 'testing' succeeded.
[2021-01-13 18:54:25,476] kopf.objects         [DEBUG   ] [kernel/test] Patching with: {'status': {'testing': {'sleep': True}}}
[2021-01-13 18:54:28,521] kopf.objects         [DEBUG   ] [kernel/test] Timer 'testing' is invoked.
[2021-01-13 18:54:28,522] kopf.objects         [INFO    ] [kernel/test] Timer 'testing' succeeded.
[2021-01-13 18:54:28,522] kopf.objects         [DEBUG   ] [kernel/test] Patching with: {'status': {'testing': {'sleep': True}}}
[2021-01-13 18:54:46,878] kopf.clients.events  [WARNING ] Failed to post an event. Ignoring and continuing. Event: type='Normal', reason='Logging', message="Timer 'testing' succeeded.".

To validate if the patching was indeed the cause, wrapping the following line with a try/except block did prevent the timer from stopping: https://github.com/nolar/kopf/blob/release/0.28/kopf/reactor/daemons.py#L469

Environment

  • Kopf version: 0.28.3
  • Kubernetes version: 1.18
  • Python version: 3.8.3
  • OS/platform: osx

vierno avatar Jan 13 '21 22:01 vierno

Indeed. Thanks for reporting.


A side-note for self:

Probably caused by this line: https://github.com/nolar/kopf/blob/release/0.28/kopf/reactor/daemons.py#L340 — intended to prevent respawns if the coroutine exits on its own accord (reason == NONE), but also accidentally covers the exceptions too (because the reason remains NONE too). These 2 cases should be distinguished.

Besides, if fixed, it should also be aligned with the same behaviour in processing — with throttling on errors to prevent K8s API overload (https://github.com/nolar/kopf/blob/release/0.28/kopf/reactor/processing.py#L65-L71), and with some last-moment logs in _runner() — because the daemons'/timers' exceptions are never re-raised otherwise (https://github.com/nolar/kopf/blob/release/0.28/kopf/reactor/queueing.py#L225).

nolar avatar Jan 19 '21 20:01 nolar

I think this is still relevant with recent versions of kopf? I see the same issue with timer

DawerRafi avatar Aug 30 '23 13:08 DawerRafi