kopf icon indicating copy to clipboard operation
kopf copied to clipboard

Delete handler issues with finalizers

Open trishul5 opened this issue 3 years ago • 13 comments

Does the framework offer a way to suppress finalizers being added to Kubernetes services?

Context Using the kopf framework, we are writing a controller that creates and deletes hosted zones with reverse DNS records in AWS Route 53. Furthermore, the controller gets triggered by applying a Kubernetes service (a built-in resource) to a cluster, specifically when the service is a network load balancer type. Finally the motivation of this controller is to fundamentally automate the task of creating hosted zones and corresponding records, something that can tedious at a large scale.

Problem The controller, written so far, is working as expected when handling the creation/update of the Kubernetes service. The update handler is invoked correctly, and the hosted zones and reverse DNS records are successfully created in Route 53. We have confirmed this using logging statements and checking in the AWS Route 53 console. However, when faced with the deletion of the Kubernetes service, the delete handler is invoked and during the execution, an error occurs regarding finalizers. The issue is that the finalizers of a Kubernetes service cannot be changed when the service in play is already deleting. Broadly speaking, editing an object that is deleting will not work. In hopes to solve this error, we have tried adding the optional=True filter our delete handler, but this has not worked. When this filter is present, the handler does not respond to the deletion of the Kubernetes service. When the deletion handler exits successfully, kopf tries to patch the service to remove its finalizer, but adds back the AWS load balancer finalizer. When this update takes place, the patching operation fails with Forbidden: no new finalizers can be added if the object is being deleted This causes the deletion handler to try again after which time the service actually deletes successfully.

Steps Going Forward We would like to figure out how to avoid the deletion handler running 2 times. Is there anything that the framework offers that can potentially prevent the finalizer from being re-added at the end of execution? We believe that this is a great way to solve the error as it allow for the normal flow of execution with deleting the hosted zones and reverse DNS records in Route 53.

Checklist

  • [✓ ] I have read the documentation and searched there for the problem
  • [✓] I have searched in the GitHub Issues for similar questions

trishul5 avatar Jul 07 '21 18:07 trishul5

Hey @trishul5, did you try using on.event() handlers?

I had a similar question in https://github.com/nolar/kopf/issues/701 and that's what did it for me.

OmegaVVeapon avatar Jul 08 '21 03:07 OmegaVVeapon

No, I haven't tried that but it looks promising. For the on.event() filters, can I specify a filter that will filter by annotation? What I want to do is only trigger when the service that is being deleted is a network load balancer type.

trishul5 avatar Jul 08 '21 03:07 trishul5

No, I haven't tried that but it looks promising. For the on.event() filters, can I specify a filter that will filter by annotation? What I want to do is only trigger when the service that is being deleted is a network load balancer type.

Yeah, you can use value callbacks. Or use callback filters if you want even more control.

OmegaVVeapon avatar Jul 08 '21 04:07 OmegaVVeapon

Alright I am going to try this all out. Thanks for the responses and guidance

trishul5 avatar Jul 08 '21 04:07 trishul5

No, I haven't tried that but it looks promising. For the on.event() filters, can I specify a filter that will filter by annotation? What I want to do is only trigger when the service that is being deleted is a network load balancer type.

Yeah, you can use value callbacks. Or use callback filters if you want even more control.

Ok, so I think it has worked! This is awesome, thanks so much.

One thing I do want to note, in the kopf logs, I get a continuous stream of messages where it says: Handling cycle is finished, waiting for new changes Removing the finalizer as there are no handlers requiring it Patching with metadata. service load balancer cleanup

Is there anything I can do to stop this?

trishul5 avatar Jul 08 '21 04:07 trishul5

Hmm, I'm afraid I don't know the answer to that one.

Haven't seen that behaviour on my end, kopf is fairly quiet for me.

You're not running with --debug or --verbose right?

If not, maybe you'll have to wait for @nolar

OmegaVVeapon avatar Jul 08 '21 05:07 OmegaVVeapon

Hmm, I'm afraid I don't know the answer to that one.

Haven't seen that behaviour on my end, kopf is fairly quiet for me.

You're not running with --debug or --verbose right?

If not, maybe you'll have to wait for @nolar

I am in fact running with verbose actually. This just might be it? It is only just benign log statements. No errors or exceptions are being logged out during the execution of the events.

trishul5 avatar Jul 08 '21 14:07 trishul5

Handling cycle is finished, waiting for new changes Removing the finalizer as there are no handlers requiring it

The messages are sent via Python logging facilities. Specifically, these ones are of the DEBUG level, so they are only visible with --verbose. You can create a startup handler and configure Python loggers/filters to exclude some messages as you would do with any other application.

import logging
import kopf

class MyFilter(logging.Filter):
    def filter(record: logging.LogRecord) -> bool:
        if 'Removing the finalizer' in record.msg:
            return False
        return True

@kopf.on.startup()
def nologs(**_):
    logging.getLogger('kopf').addFilter(MyFilter())

However, keep in mind that Kopf can add, remove, or change the log messages at any time without warning. So as the hierarchy of the packages/loggers — only the top-level logger "kopf" is guaranteed to be used.

Patching with metadata. service load balancer cleanup

This one is not Kopf's.

nolar avatar Jul 09 '21 05:07 nolar

When the deletion handler exits successfully, kopf tries to patch the service to remove its finalizer, but adds back the AWS load balancer finalizer.

Can you please clarify here? Does it happen because the AWS Load Balancer's finalizer is removed during the time while Kopf's handler is being executed? And that the AWS LB's finalizer is not Kopf-/operator-managed?

nolar avatar Jul 09 '21 05:07 nolar

When the deletion handler exits successfully, kopf tries to patch the service to remove its finalizer, but adds back the AWS load balancer finalizer.

Can you please clarify here? Does it happen because the AWS Load Balancer's finalizer is removed during the time while Kopf's handler is being executed? And that the AWS LB's finalizer is not Kopf-/operator-managed?

The answer to both of your questions is yes. I believe that as soon as we figure this out the custom controller should be perfectly stable.

trishul5 avatar Jul 09 '21 14:07 trishul5

I have similar problem:

$ kubectl -n yc1 get pod podname -o json | jq '.metadata.finalizers'
[
  "batch.kubernetes.io/job-tracking",
  "kopf.zalando.org/KopfFinalizerMarker"
]

Code from kopf ~ 1.36.2:

def allow_deletion(
        *,
        body: bodies.Body,
        patch: patches.Patch,
        finalizer: str,
) -> None:
    if is_deletion_blocked(body=body, finalizer=finalizer):
        finalizers = body.get('metadata', {}).get('finalizers', [])
        patch.setdefault('metadata', {}).setdefault('finalizers', list(finalizers))
        if finalizer in patch['metadata']['finalizers']:
            patch['metadata']['finalizers'].remove(finalizer)

Looks like there is a race condition here:

  1. Pod has two finalizers set by JobController and by Kopf
  2. Pod is being deleted somehow
  3. Kopf reacts on pod deletion, removes own finalizer replacing a list with two finalizers with a list with only one (job-tracking)
  4. In the same time JobController removes job-tracking itself
  5. On the kopf side this patch looks like removal: [1,2] -> [2]
  6. But one k8s side it's addition of second finalizer: [1] -> [2]

The way K8s performs finalizer removal is different:

func removeTrackingFinalizerPatch(pod *v1.Pod) []byte {
	if !hasJobTrackingFinalizer(pod) {
		return nil
	}
	patch := map[string]interface{}{
		"metadata": map[string]interface{}{
			"$deleteFromPrimitiveList/finalizers": []string{batch.JobTrackingFinalizer},
		},
	}
	patchBytes, _ := json.Marshal(patch)
	return patchBytes
}

It's "Strategic merge patch" which is described as obsolete in k8s devel docs, but it works internally at least without this race.

tumb1er avatar Oct 30 '23 07:10 tumb1er

@tumb1er Have you resolved the problem you were writing about ?

I guess usage of just event handler and checking type of action works, but I think that this should be solved.

zilinjak avatar Feb 19 '24 14:02 zilinjak

Have you resolved the problem you were writing about ?

We just disabled finalizers everywhere for a number of reasons. It required some extra checks to stop timers on deleted objects (timers are used to double-check deletion mostly), but works (at least, no issues in bug tracker about it).

tumb1er avatar Feb 21 '24 05:02 tumb1er