kopf icon indicating copy to clipboard operation
kopf copied to clipboard

Support multi-step deletion process

Open jkupferer opened this issue 2 years ago • 2 comments

Problem

I want to use a multi-step deletion process where deletion triggers activity outside of the operator which then calls back to allow completion of the delete.

Proposal

It should be possible for an on.delete handler to return an explicit value or raise a special exception that indicates that no error occurred but delete processing is not yet complete. This would indicate that finalizers should not be removed. If a multi-step delete process is indicated on the on.delete handler then it should also run for any updates to the resource.

Code

@kopf.on.delete("foo.example.com", "v1", "myresourcekind", explicit_completion=True)
def delete_handler(name, spec, status, logger, **_):
  if check_if_delete_can_complete(spec, status):
    return True;
  else:
    start_delete_process(name, spec, status);
    return False

... or using a special exception ...

@kopf.on.delete("foo.example.com", "v1", "myresourcekind")
def delete_handler(name, spec, status, logger, **_):
  if not check_if_delete_can_complete(spec, status):
    start_delete_process(name, spec, status);
    raise kopf.IncompleteException('Delete process initiated')

Additional information

No response

jkupferer avatar Oct 22 '21 19:10 jkupferer

Hello.

This is the default behaviour of the deletion handlers, as of any other handlers if kopf.TemporaryError is raised. The exact delay can be passed as delay=... to the exception. The number of retries to the moment is available in the kwarg retry (starts with 0 for the initial retry).

For example:

import kopf

@kopf.on.delete('kopfexamples')
def delete_fn(retry, logger, **_):
    if retry < 3:
        raise kopf.TemporaryError("not yet", delay=5)
    logger.info('DELETED')
$ kubectl create -f examples/obj.yaml 
$ time kubectl delete -f examples/obj.yaml 
kopfexample.kopf.dev "kopf-example-1" deleted

real	0m15.740s
user	0m0.087s
sys	0m0.033s
[2021-10-22 22:57:52,380] kopf.objects         [DEBUG   ] [default/kopf-example-1] Handling cycle is finished, waiting for new changes.
[2021-10-22 22:58:04,832] kopf.objects         [DEBUG   ] [default/kopf-example-1] Deletion is in progress: {'apiVersion': 'kopf.dev/v1', 'kind': 'KopfExample', ………}}}
[2021-10-22 22:58:04,833] kopf.objects         [DEBUG   ] [default/kopf-example-1] Handler 'delete_fn' is invoked.
[2021-10-22 22:58:04,835] kopf.objects         [ERROR   ] [default/kopf-example-1] Handler 'delete_fn' failed temporarily: not yet
[2021-10-22 22:58:04,837] kopf.objects         [DEBUG   ] [default/kopf-example-1] Patching with: {'metadata': {'annotations': {'kopf.zalando.org/delete_fn': '{"started":"2021-10-22T20:58:04.832405","delayed":"2021-10-22T20:58:09.835818","purpose":"delete","retries":1,"success":false,"failure":false,"message":"not yet"}'}}, 'status': {'kopf': {'progress': {'delete_fn': {'started': '2021-10-22T20:58:04.832405', 'stopped': None, 'delayed': '2021-10-22T20:58:09.835818', 'purpose': 'delete', 'retries': 1, 'success': False, 'failure': False, 'message': 'not yet', 'subrefs': None}}}}}
[2021-10-22 22:58:04,879] kopf.objects         [DEBUG   ] [default/kopf-example-1] Sleeping was skipped because of the patch, 4.999147 seconds left.
[2021-10-22 22:58:04,987] kopf.objects         [DEBUG   ] [default/kopf-example-1] Deletion is in progress: {'apiVersion': 'kopf.dev/v1', 'kind': 'KopfExample', ………}}}}}
[2021-10-22 22:58:04,988] kopf.objects         [DEBUG   ] [default/kopf-example-1] Sleeping for 4.847348 seconds for the delayed handlers.
[2021-10-22 22:58:09,843] kopf.objects         [DEBUG   ] [default/kopf-example-1] Patching with: {'metadata': {'annotations': {'kopf.zalando.org/touch-dummy': '2021-10-22T20:58:09.842363'}}, 'status': {'kopf': {'dummy': '2021-10-22T20:58:09.842363'}}}
[2021-10-22 22:58:09,997] kopf.objects         [DEBUG   ] [default/kopf-example-1] Deletion is in progress: {'apiVersion': 'kopf.dev/v1', 'kind': 'KopfExample',………}}}}}
[2021-10-22 22:58:09,998] kopf.objects         [DEBUG   ] [default/kopf-example-1] Handler 'delete_fn' is invoked.
[2021-10-22 22:58:10,000] kopf.objects         [ERROR   ] [default/kopf-example-1] Handler 'delete_fn' failed temporarily: not yet
[2021-10-22 22:58:10,001] kopf.objects         [DEBUG   ] [default/kopf-example-1] Patching with: {'metadata': {'annotations': {'kopf.zalando.org/delete_fn': '{"started":"2021-10-22T20:58:04.832405","delayed":"2021-10-22T20:58:15.000683","purpose":"delete","retries":2,"success":false,"failure":false,"message":"not yet"}', 'kopf.zalando.org/touch-dummy': None}}, 'status': {'kopf': {'progress': {'delete_fn': {'started': '2021-10-22T20:58:04.832405', 'stopped': None, 'delayed': '2021-10-22T20:58:15.000683', 'purpose': 'delete', 'retries': 2, 'success': False, 'failure': False, 'message': 'not yet', 'subrefs': None}}, 'dummy': None}}}
[2021-10-22 22:58:10,033] kopf.objects         [DEBUG   ] [default/kopf-example-1] Sleeping was skipped because of the patch, 4.999714 seconds left.
[2021-10-22 22:58:10,146] kopf.objects         [DEBUG   ] [default/kopf-example-1] Deletion is in progress: {'apiVersion': 'kopf.dev/v1', 'kind': 'KopfExample', ………}}}}}
[2021-10-22 22:58:10,147] kopf.objects         [DEBUG   ] [default/kopf-example-1] Sleeping for 4.853016 seconds for the delayed handlers.
[2021-10-22 22:58:15,002] kopf.objects         [DEBUG   ] [default/kopf-example-1] Patching with: {'metadata': {'annotations': {'kopf.zalando.org/touch-dummy': '2021-10-22T20:58:15.002437'}}, 'status': {'kopf': {'dummy': '2021-10-22T20:58:15.002437'}}}
[2021-10-22 22:58:15,147] kopf.objects         [DEBUG   ] [default/kopf-example-1] Deletion is in progress: {'apiVersion': 'kopf.dev/v1', 'kind': 'KopfExample', ………}}}}
[2021-10-22 22:58:15,148] kopf.objects         [DEBUG   ] [default/kopf-example-1] Handler 'delete_fn' is invoked.
[2021-10-22 22:58:15,149] kopf.objects         [ERROR   ] [default/kopf-example-1] Handler 'delete_fn' failed temporarily: not yet
[2021-10-22 22:58:15,150] kopf.objects         [DEBUG   ] [default/kopf-example-1] Patching with: {'metadata': {'annotations': {'kopf.zalando.org/delete_fn': '{"started":"2021-10-22T20:58:04.832405","delayed":"2021-10-22T20:58:20.150054","purpose":"delete","retries":3,"success":false,"failure":false,"message":"not yet"}', 'kopf.zalando.org/touch-dummy': None}}, 'status': {'kopf': {'progress': {'delete_fn': {'started': '2021-10-22T20:58:04.832405', 'stopped': None, 'delayed': '2021-10-22T20:58:20.150054', 'purpose': 'delete', 'retries': 3, 'success': False, 'failure': False, 'message': 'not yet', 'subrefs': None}}, 'dummy': None}}}
[2021-10-22 22:58:15,188] kopf.objects         [DEBUG   ] [default/kopf-example-1] Sleeping was skipped because of the patch, 4.999713 seconds left.
[2021-10-22 22:58:15,295] kopf.objects         [DEBUG   ] [default/kopf-example-1] Deletion is in progress: {'apiVersion': 'kopf.dev/v1', 'kind': 'KopfExample', ………}}}}
[2021-10-22 22:58:15,296] kopf.objects         [DEBUG   ] [default/kopf-example-1] Sleeping for 4.853645 seconds for the delayed handlers.
[2021-10-22 22:58:20,156] kopf.objects         [DEBUG   ] [default/kopf-example-1] Patching with: {'metadata': {'annotations': {'kopf.zalando.org/touch-dummy': '2021-10-22T20:58:20.155361'}}, 'status': {'kopf': {'dummy': '2021-10-22T20:58:20.155361'}}}
[2021-10-22 22:58:20,310] kopf.objects         [DEBUG   ] [default/kopf-example-1] Deletion is in progress: {'apiVersion': 'kopf.dev/v1', 'kind': 'KopfExample'………}}}}}
[2021-10-22 22:58:20,310] kopf.objects         [DEBUG   ] [default/kopf-example-1] Handler 'delete_fn' is invoked.
[2021-10-22 22:58:20,311] kopf.objects         [INFO    ] [default/kopf-example-1] DELETED
[2021-10-22 22:58:20,315] kopf.objects         [INFO    ] [default/kopf-example-1] Handler 'delete_fn' succeeded.
[2021-10-22 22:58:20,315] kopf.objects         [INFO    ] [default/kopf-example-1] Deletion is processed: 1 succeeded; 0 failed.
[2021-10-22 22:58:20,316] kopf.objects         [DEBUG   ] [default/kopf-example-1] Removing the finalizer, thus allowing the actual deletion.
[2021-10-22 22:58:20,316] kopf.objects         [DEBUG   ] [default/kopf-example-1] Patching with: {'metadata': {'annotations': {'kopf.zalando.org/delete_fn': None, 'kopf.zalando.org/touch-dummy': None}, 'finalizers': []}, 'status': {'kopf': {'progress': {'delete_fn': None}, 'dummy': None}}}
[2021-10-22 22:58:20,353] kopf.objects         [DEBUG   ] [default/kopf-example-1] Patching was skipped: the object does not exist anymore.
[2021-10-22 22:58:20,460] kopf.objects         [DEBUG   ] [default/kopf-example-1] Deleted, really deleted, and we are notified.
[2021-10-22 22:58:20,461] kopf.objects         [DEBUG   ] [default/kopf-example-1] Removing the finalizer, thus allowing the actual deletion.

I'm not sure what happened at the end of the log here — I'll check it later. But generally, the delayed deletion worked: it took 15 seconds (attempts 0, 1, 2 caused a delay by 5 seconds, attempt 3 succeeded).


then it should also run for any updates to the resource.

This might be a problem and Kopf has some known limitations to detect that.

Kubernetes does not delete the resources, it marks them for deletion by setting the metadata.deletionTimestamp. Then, it simply waits until all finalizers disappear — i.e. when every controller removes their own finalizer. And only after that, the resource is actually deleted.

  • https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-deletion

This is how Kopf understands that the resource is being deleted — by the existence or absence of the deletion timestamp: if the timestamp is set, it calls on-delete; if not set, then on-update (or if the last-handled-state is not set, then on-create, but this is not the case here).

In other words, Kopf cannot properly detect if the change of a field was a relevant update when the object is in the state of deletion. However, you can notice that:

@kopf.on.delete('kopfexamples')
def delete_fn(retry, logger, diff, **_):
    logger.info(f"{diff!r}")
    if retry < 3:
        raise kopf.TemporaryError("no yet", delay=5)
    logger.info('DELETED')
time kubectl delete -f examples/obj.yaml &
kubectl patch -f examples/obj.yaml --type merge -p '{"spec": {"field": "xxx"}}'
[2021-10-22 23:14:57,662] kopf.objects         [DEBUG   ] [default/kopf-example-1] Handler 'delete_fn' is invoked.
[2021-10-22 23:14:57,662] kopf.objects         [INFO    ] [default/kopf-example-1] ()
[2021-10-22 23:14:57,664] kopf.objects         [ERROR   ] [default/kopf-example-1] Handler 'delete_fn' failed temporarily: no yet

[2021-10-22 23:15:02,820] kopf.objects         [DEBUG   ] [default/kopf-example-1] Handler 'delete_fn' is invoked.
[2021-10-22 23:15:02,821] kopf.objects         [INFO    ] [default/kopf-example-1] (('change', ('spec', 'field'), 'value', 'xxx'),)
[2021-10-22 23:15:02,827] kopf.objects         [ERROR   ] [default/kopf-example-1] Handler 'delete_fn' failed temporarily: no yet

The detection is not immediate, as usually that handler is marked as temporarily failed and it will wait until is retry time comes.

The proper solution for this issue would be the idea from #9 / #163 — remembering which handler processed which state of the object — but I failed two times with that already.

I will think about how Kopf should treat the object changes that happen during its deletion, and whether they should be considered updates (which seems reasonable). And if so, how should Kopf separate the "updates processing" from the "deletion processing".

For now, I suggest considering this as a known limitation of the framework and looking for workarounds: e.g. diff/old/new kwargs in the deletion handlers; daemons might be a good idea too: they can also block the resource deletion until exited, and the exiting can take time (the exiting reason can be checked with stopped.reason & kopf.DaemonStoppingReason.RESOURCE_DELETED).

nolar avatar Oct 22 '21 21:10 nolar

I thought about using a TemporaryError here, but it did not seem appropriate to have the logs reporting errors when it wasn't an error condition. I suppose that would be a separate feature request? A way to indicate log level with a TemporaryError or other exception class for reporting a non-error condition that should still result in a retry?

Also, in my use case the delay isn't quite appropriate. It receives an outside event. The way the process works is that it kicks off a process in an external system and then receives a callback from that system via a flask api that runs along side the kopf operator. The callback updates the resource state and then it responds to the update event.

At present I have this working with on.event handlers, but this way I am missing out on a lot of kopf functionality and having to maintain my own finalizers outside of the framework.

jkupferer avatar Oct 23 '21 04:10 jkupferer