kopf
kopf copied to clipboard
kopf adds finalizer without "delete" handler
Long story short
We have some handler for our resources for update/create/timer/startup (without delete). Unfortunately kopf adds a finalizer to the resource. As mention in the documentation (https://kopf.readthedocs.io/en/stable/handlers/#state-changing-handlers) there shouldn't be a finalizer added to the resource.
Kopf version
1.33.0
Kubernetes version
1.20.2
Python version
3.7.4
Code
RECONCILE_INTERVAL_SECONDS = 60
# reconcile every RECONCILE_INTERVAL_SECONDS or listen on the create/update events
@kopf.timer('', 'v1', 'secrets', annotations={'klacks/synchronizer': 'yes'}, interval=RECONCILE_INTERVAL_SECONDS)
@kopf.on.create('', 'v1', 'secrets', annotations={'klacks/synchronizer': 'yes'})
@kopf.on.update('', 'v1', 'secrets', annotations={'klacks/synchronizer': 'yes'})
def handle_secret(body, meta, spec, status, **kwargs):
logger.debug('handle_secret name/namespace: {}/{}'.format(meta.name, meta.namespace))
v1 = client.CoreV1Api()
namespace_response = v1.list_namespace()
namespaces = [nsa.metadata.name for nsa in namespace_response.items]
# remove the namespace of the watched resources...
namespaces.remove(meta.namespace)
resource = v1.read_namespaced_secret(meta.name, meta.namespace)
resource.metadata.annotations.pop('klacks/synchronizer')
resource.metadata.annotations.pop('klacks/include-namespaces')
resource.metadata.annotations.pop('kopf.zalando.org/last-handled-configuration')
resource.metadata.resource_version = None
resource.metadata.uid = None
logger.debug('target namespaces: {}'.format(parse_target_namespaces(meta, namespaces)))
for namespace in parse_target_namespaces(meta, namespaces):
resource.metadata.namespace = namespace
try:
v1.patch_namespaced_secret(name=meta.name, namespace=namespace, body=resource)
logger.debug('secret patched. name={}, namespace={}'.format(meta.name, namespace))
except ApiException as e:
try:
v1.create_namespaced_secret(namespace=namespace, body=resource)
logger.debug('secret create. name={}, namespace={}'.format(meta.name, namespace))
except Exception as e:
logger.error('updating/creating secret went wrong')
raise e
# reconcile every 2 Minutes or listen on the create/update events
@kopf.timer('', 'v1', 'configmaps', annotations={'klacks/synchronizer': 'yes'}, interval=RECONCILE_INTERVAL_SECONDS)
@kopf.on.create('', 'v1', 'configmaps', annotations={'klacks/synchronizer': 'yes'})
@kopf.on.update('', 'v1', 'configmaps', annotations={'klacks/synchronizer': 'yes'})
def handle_config_map(body, meta, spec, status, **kwargs):
logger.debug('handle_config_map name/namespace: {}/{}'.format(meta.name, meta.namespace))
v1 = client.CoreV1Api()
namespace_response = v1.list_namespace()
namespaces = [nsa.metadata.name for nsa in namespace_response.items]
namespaces.remove(meta.namespace)
resource = v1.read_namespaced_config_map(meta.name, meta.namespace)
resource.metadata.annotations.pop('klacks/synchronizer')
resource.metadata.annotations.pop('klacks/include-namespaces')
resource.metadata.annotations.pop('kopf.zalando.org/last-handled-configuration')
resource.metadata.resource_version = None
resource.metadata.uid = None
logger.debug('target namespaces: {}'.format(parse_target_namespaces(meta, namespaces)))
for namespace in parse_target_namespaces(meta, namespaces):
resource.metadata.namespace = namespace
try:
v1.patch_namespaced_config_map(name=meta.name, namespace=namespace, body=resource)
logger.debug('configmap patched. name={}, namespace={}'.format(meta.name, namespace))
except ApiException as e:
try:
v1.create_namespaced_config_map(namespace=namespace, body=resource)
logger.debug('configmap created. name={}, namespace={}'.format(meta.name, namespace))
except Exception as e:
logger.error('updating/creating configmap went wrong')
raise e
# reconcile every 2 Minutes or listen on the create/update events
@kopf.timer('services.cloud.sap.com', 'v1alpha1', 'servicebindings', annotations={'klacks/synchronizer': 'yes'}, interval=RECONCILE_INTERVAL_SECONDS)
@kopf.on.create('services.cloud.sap.com', 'v1alpha1', 'servicebindings', annotations={'klacks/synchronizer': 'yes'})
@kopf.on.update('services.cloud.sap.com', 'v1alpha1', 'servicebindings', annotations={'klacks/synchronizer': 'yes'})
def handle_servicebinding(body, meta, spec, status, **kwargs):
logger.debug('handle_servicebinding name/namespace: {}/{}'.format(meta.name, meta.namespace))
v1 = client.CoreV1Api()
namespace_response = v1.list_namespace()
namespaces = [nsa.metadata.name for nsa in namespace_response.items]
# remove the namespace of the watched resources...
namespaces.remove(meta.namespace)
try:
secret_name = body.spec["secretName"]
logger.debug('handle secret: {}'.format(secret_name))
resource = v1.read_namespaced_secret(secret_name, meta.namespace)
resource.metadata.resource_version = None
resource.metadata.uid = None
resource.metadata.owner_references = None
for namespace in parse_target_namespaces(meta, namespaces):
resource.metadata.namespace = namespace
try:
v1.patch_namespaced_secret(name=secret_name, namespace=namespace, body=resource)
logger.debug('secret patched. name={}, namespace={}'.format(secret_name, namespace))
except ApiException as e:
try:
v1.create_namespaced_secret(namespace=namespace, body=resource)
logger.debug('secret create. name={}, namespace={}'.format(secret_name, namespace))
except Exception as e:
logger.error('updating/creating secret went wrong')
raise e
except Exception:
print("Secret '{}' for ServiceBinding '{}' not yet available...".format(secret_name, meta.name))
def parse_target_namespaces(meta, existing_namespaces):
target_namespaces = []
# look for a namespace inclusion label first, if we don't find that, assume all namespaces are the target
if 'klacks/include-namespaces' in meta.annotations:
value = meta.annotations['klacks/include-namespaces']
namespaces_to_include = value.replace(' ', '').split(',')
for namespace in namespaces_to_include:
if namespace in existing_namespaces:
target_namespaces.append(namespace)
else:
logger.debug("Try to add a resource to a non-existing namespace: {}".format(namespace))
else:
# we didn't find a namespace inclusion label, so let's see if we were told to exclude any
target_namespaces = list(existing_namespaces)
if 'klacks/exclude-namespaces' in meta.annotations:
value = meta.annotations['klacks/exclude-namespaces']
namespaces_to_exclude = value.replace(' ', '').split(',')
if len(namespaces_to_exclude) == 0:
logger.debug("Empty 'exclude-namespaces' list in annotation")
for namespace in namespaces_to_exclude:
if namespace in target_namespaces:
target_namespaces.remove(namespace)
else:
logger.debug("Try to exclude a resource to a non-existing namespace: {}".format(namespace))
return target_namespaces
@kopf.on.startup()
def startup_kopf(settings: kopf.OperatorSettings, **kwargs):
logger.debug('on.startup:')
# Events got lost ... see https://github.com/nolar/kopf/issues/232
settings.watching.server_timeout = 300
Logs
No response
Additional information
it is not possible to delete the watched resources without removing the finalizer before. It seems that our code do not provide the required delete endpoint for the finalizer call.
it seems that the @kopf.timer
is the problem cause. Adding this annotation adds the finalizer for some reason.
Not documented and not required, or?
Both timers and daemons add finalizers to all served (i.e. matching) resources. This is needed to ensure proper termination of those asyncio tasks running the timers/daemons (timer is essentially a daemon with regular asyncio.sleeps).
To be more specific, from the framework's point of view, timer/daemon cancellation can take a significant time, i.e. it is not guaranteed to be instant because of the timer's/daemon's code (which is outside of Kopf's control). There can be several update cycles before the tasks are finally stopped. As such, the resource must exist to contain the state of this termination. The framework releases the resources only after the tasks are confirmed to be stopped.
And so, finalizers are added when timers/daemons exist — to prevent the sudden disappearance of the resources which would leave orphan (non-stopped) tasks.
It is documented here:
- https://kopf.readthedocs.io/en/stable/daemons/#deletion-prevention
it is not possible to delete the watched resources without removing the finalizer before.
It is unclear why this conclusion is made. Deleting the resource (via CLI or API) only marks it for deletion by setting metadata.deletionTimestamp
. Since this moment, Kopf initiates the deletion handlers (if any) and stops/cancels the daemons and timers (specifically, their asyncio tasks). Once the tasks are stopped, it removes the finalizer and lets the resource actually go (unless there are other finalizers of other operators).
If the operator is not running at the moment, then the resources are indeed "blocked". Force-deletion can help here.
There are cases when the finalizers are removed on deletion without the task being fully stopped: e.g. when cancellation_timeout=0
or any low value (but not None
!) for daemons:
- https://kopf.readthedocs.io/en/stable/daemons/#timeouts
But the finalizers are added anyway. This "instant release" of the resource happens only if the operator is running at the time of deletion.
If this is not the desired behaviour, you can easily simulate timers & daemons with your own threads or tasks: start them in create/resume handlers, store to memo
, stop/cancel in the optional deletion handlers or from the inside — with some risk of getting orphan tasks for deleted resources.
This is essentially what Kopf does, but with more logging and safety measures.
@nolar Is it possible to prevent these finalizers from being set (ie. by setting `settings.persistence.finalizer = None)? I am aware that it's an "at your own risk"-change.
Hey! I stumbled upon this issue while debugging something slightly different. Simplified, we have a Helm chart that has a CRD, one CRD resource and a Kopf operator. There is a timer for the CRD and hence, a finalizer attached to it.
The problem is when helm uninstall
is run and the operator's deployment gets deleted before the CRD resource. The operator shuts down successfully but the finalizer is never removed and the CRD resource with all of its children (that the operator created) are stuck.
Note that this is not necessarily Helm specific, one could observe similar results with a regular yaml template & kubectl apply
and subsequent delete
.
I am not yet sure what the best approach is, but wanted to add a different point of view to this thread. Maybe we could remove the finalizers on operator shutdown?