kopf
kopf copied to clipboard
Delete handler issues with finalizers
Does the framework offer a way to suppress finalizers being added to Kubernetes services?
Context Using the kopf framework, we are writing a controller that creates and deletes hosted zones with reverse DNS records in AWS Route 53. Furthermore, the controller gets triggered by applying a Kubernetes service (a built-in resource) to a cluster, specifically when the service is a network load balancer type. Finally the motivation of this controller is to fundamentally automate the task of creating hosted zones and corresponding records, something that can tedious at a large scale.
Problem
The controller, written so far, is working as expected when handling the creation/update of the Kubernetes service. The update handler is invoked correctly, and the hosted zones and reverse DNS records are successfully created in Route 53. We have confirmed this using logging statements and checking in the AWS Route 53 console.
However, when faced with the deletion of the Kubernetes service, the delete handler is invoked and during the execution, an error occurs regarding finalizers. The issue is that the finalizers of a Kubernetes service cannot be changed when the service in play is already deleting. Broadly speaking, editing an object that is deleting will not work. In hopes to solve this error, we have tried adding the optional=True filter our delete handler, but this has not worked. When this filter is present, the handler does not respond to the deletion of the Kubernetes service.
When the deletion handler exits successfully, kopf tries to patch the service to remove its finalizer, but adds back the AWS load balancer finalizer. When this update takes place, the patching operation fails with Forbidden: no new finalizers can be added if the object is being deleted
This causes the deletion handler to try again after which time the service actually deletes successfully.
Steps Going Forward We would like to figure out how to avoid the deletion handler running 2 times. Is there anything that the framework offers that can potentially prevent the finalizer from being re-added at the end of execution? We believe that this is a great way to solve the error as it allow for the normal flow of execution with deleting the hosted zones and reverse DNS records in Route 53.
Checklist
- [✓ ] I have read the documentation and searched there for the problem
- [✓] I have searched in the GitHub Issues for similar questions
Hey @trishul5, did you try using on.event()
handlers?
I had a similar question in https://github.com/nolar/kopf/issues/701 and that's what did it for me.
No, I haven't tried that but it looks promising. For the on.event() filters, can I specify a filter that will filter by annotation? What I want to do is only trigger when the service that is being deleted is a network load balancer type.
No, I haven't tried that but it looks promising. For the on.event() filters, can I specify a filter that will filter by annotation? What I want to do is only trigger when the service that is being deleted is a network load balancer type.
Yeah, you can use value callbacks. Or use callback filters if you want even more control.
Alright I am going to try this all out. Thanks for the responses and guidance
No, I haven't tried that but it looks promising. For the on.event() filters, can I specify a filter that will filter by annotation? What I want to do is only trigger when the service that is being deleted is a network load balancer type.
Yeah, you can use value callbacks. Or use callback filters if you want even more control.
Ok, so I think it has worked! This is awesome, thanks so much.
One thing I do want to note, in the kopf logs, I get a continuous stream of messages where it says: Handling cycle is finished, waiting for new changes Removing the finalizer as there are no handlers requiring it Patching with metadata. service load balancer cleanup
Is there anything I can do to stop this?
Hmm, I'm afraid I don't know the answer to that one.
Haven't seen that behaviour on my end, kopf is fairly quiet for me.
You're not running with --debug
or --verbose
right?
If not, maybe you'll have to wait for @nolar
Hmm, I'm afraid I don't know the answer to that one.
Haven't seen that behaviour on my end, kopf is fairly quiet for me.
You're not running with
--debug
or--verbose
right?If not, maybe you'll have to wait for @nolar
I am in fact running with verbose actually. This just might be it? It is only just benign log statements. No errors or exceptions are being logged out during the execution of the events.
Handling cycle is finished, waiting for new changes Removing the finalizer as there are no handlers requiring it
The messages are sent via Python logging facilities. Specifically, these ones are of the DEBUG level, so they are only visible with --verbose
. You can create a startup handler and configure Python loggers/filters to exclude some messages as you would do with any other application.
import logging
import kopf
class MyFilter(logging.Filter):
def filter(record: logging.LogRecord) -> bool:
if 'Removing the finalizer' in record.msg:
return False
return True
@kopf.on.startup()
def nologs(**_):
logging.getLogger('kopf').addFilter(MyFilter())
However, keep in mind that Kopf can add, remove, or change the log messages at any time without warning. So as the hierarchy of the packages/loggers — only the top-level logger "kopf" is guaranteed to be used.
Patching with metadata. service load balancer cleanup
This one is not Kopf's.
When the deletion handler exits successfully, kopf tries to patch the service to remove its finalizer, but adds back the AWS load balancer finalizer.
Can you please clarify here? Does it happen because the AWS Load Balancer's finalizer is removed during the time while Kopf's handler is being executed? And that the AWS LB's finalizer is not Kopf-/operator-managed?
When the deletion handler exits successfully, kopf tries to patch the service to remove its finalizer, but adds back the AWS load balancer finalizer.
Can you please clarify here? Does it happen because the AWS Load Balancer's finalizer is removed during the time while Kopf's handler is being executed? And that the AWS LB's finalizer is not Kopf-/operator-managed?
The answer to both of your questions is yes. I believe that as soon as we figure this out the custom controller should be perfectly stable.
I have similar problem:
$ kubectl -n yc1 get pod podname -o json | jq '.metadata.finalizers'
[
"batch.kubernetes.io/job-tracking",
"kopf.zalando.org/KopfFinalizerMarker"
]
Code from kopf ~ 1.36.2:
def allow_deletion(
*,
body: bodies.Body,
patch: patches.Patch,
finalizer: str,
) -> None:
if is_deletion_blocked(body=body, finalizer=finalizer):
finalizers = body.get('metadata', {}).get('finalizers', [])
patch.setdefault('metadata', {}).setdefault('finalizers', list(finalizers))
if finalizer in patch['metadata']['finalizers']:
patch['metadata']['finalizers'].remove(finalizer)
Looks like there is a race condition here:
- Pod has two finalizers set by JobController and by Kopf
- Pod is being deleted somehow
- Kopf reacts on pod deletion, removes own finalizer replacing a list with two finalizers with a list with only one (
job-tracking
) - In the same time JobController removes
job-tracking
itself - On the kopf side this patch looks like removal:
[1,2] -> [2]
- But one k8s side it's addition of second finalizer:
[1] -> [2]
The way K8s performs finalizer removal is different:
func removeTrackingFinalizerPatch(pod *v1.Pod) []byte {
if !hasJobTrackingFinalizer(pod) {
return nil
}
patch := map[string]interface{}{
"metadata": map[string]interface{}{
"$deleteFromPrimitiveList/finalizers": []string{batch.JobTrackingFinalizer},
},
}
patchBytes, _ := json.Marshal(patch)
return patchBytes
}
It's "Strategic merge patch" which is described as obsolete in k8s devel docs, but it works internally at least without this race.
@tumb1er Have you resolved the problem you were writing about ?
I guess usage of just event handler and checking type of action works, but I think that this should be solved.
Have you resolved the problem you were writing about ?
We just disabled finalizers everywhere for a number of reasons. It required some extra checks to stop timers on deleted objects (timers are used to double-check deletion mostly), but works (at least, no issues in bug tracker about it).