kopf icon indicating copy to clipboard operation
kopf copied to clipboard

kopf suddenly stop detecting any changes approximately after 20 mins

Open RSE132 opened this issue 1 year ago • 5 comments

Long story short

kopf suddenly stop detecting any changes approximately after 20 mins, this behaviour is unexpected. I useing latest version 1.37.2

Kopf version

1.37.2

Kubernetes version

1.29

Python version

3.12

Code

@kopf.on.create('perpetual.com', 'v1', 'vab')
def create_fn(spec,name,namespace, **kwargs):
    logging.info(f"New VaultAuthBackend resource found: {name},{namespace}")
    vabInit(spec,name,namespace, **kwargs)
    return

@kopf.on.delete('perpetual.com', 'v1', 'vab')
def delete_fn(spec,name,namespace, **kwargs):
    specConfig = spec.get('config', {})
    clusterName = specConfig.get('cluster', '')
    vaultOwner = spec.get('vaultowner', '')
    logging.info(f"Deleting VaultAuthBackend resource: {name},{namespace}")
    vaultOnboarding.deactivate(clusterName,vaultOwner)
    return

Logs

2024-08-03 01:06:14,866 :: DEBUG :: Handler 'create_fn' is invoked.
2024-08-03 01:06:14,867 :: INFO :: New VaultAuthBackend resource found: landingzone-snadbox-pox-vabrr,landing-zone
2024-08-03 01:06:16,403 :: INFO :: VaultauthBackend unregistered successfully from MSS. cluster=landingzone-snadbox-pox
2024-08-03 01:06:16,518 :: INFO :: Handler 'delete_fn' succeeded.
2024-08-03 01:06:16,518 :: INFO :: Deletion is processed: 1 succeeded; 0 failed.
2024-08-03 01:06:16,518 :: DEBUG :: Removing the finalizer, thus allowing the actual deletion.
2024-08-03 01:06:16,518 :: DEBUG :: Patching with: {'metadata': {'finalizers': []}}
2024-08-03 01:06:16,641 :: INFO :: Service Account already exist sa=:vault-auth
2024-08-03 01:06:16,668 :: DEBUG :: Deleted, really deleted, and we are notified.
2024-08-03 01:06:16,669 :: DEBUG :: Removing the finalizer, thus allowing the actual deletion.
2024-08-03 01:06:16,673 :: INFO :: Secret already exist. secret=:vault-auth-sa-token-secret
2024-08-03 01:06:19,758 :: INFO :: Cluster rolebinding 'role-tokenreview-binding' already exist for the sa=vault-auth
2024-08-03 01:06:21,563 :: INFO :: Successfully enabled the Kubernetes auth method.
2024-08-03 01:06:21,797 :: INFO :: Successfully configured the Kubernetes auth method.
2024-08-03 01:06:24,672 :: INFO :: Kubernetes role=vault-secrets-webhook for auth_backend=landingzone-snadbox-pox created successfully
2024-08-03 01:06:25,074 :: INFO :: Handler 'create_fn' succeeded.
2024-08-03 01:06:25,074 :: INFO :: Creation is processed: 1 succeeded; 0 failed.
2024-08-03 01:06:25,075 :: DEBUG :: Patching with: {'metadata': {'annotations': {'kopf.zalando.org/last-handled-configuration': '{"spec":{"config":{"authmethod":"kubernetes","cluster":"landingzone-snadbox-pox","clusterenv":"nonprod","kubeconfig":"landingzone-snadbox-pox-kubeconfig"},"vaultaddress":"https://vault.maersk-digital.net","vaultowner":"perpetual"},"metadata":{"annotations":{"perpetual.com/reconcile-policy":"detach-on-delete"}}}\n'}}}
2024-08-03 01:06:25,216 :: DEBUG :: Something has changed, but we are not interested (the essence is the same).
2024-08-03 01:06:25,216 :: DEBUG :: Handling cycle is finished, waiting for new changes.

Additional information

After Handling cycle is finished, waiting for new changes, kopf do not detect any new changes after 20 mins of idle time

RSE132 avatar Aug 03 '24 00:08 RSE132

Hello guys, i am also facing the same issue using kopf 1.37.2 on python 3.12 and kubernetes 1.29. It was working perfectly well until we updated from kubernetes 1.28 to 1.29 (1.29.7-gke.1274000 on google)

dsarazin avatar Sep 12 '24 15:09 dsarazin

We see the same issue im multiple kopf based operators. Any idea about the root of the problem? My team and I would be happy to contribute.

SteinRobert avatar Oct 10 '24 11:10 SteinRobert

Hello,

we faced similar issues using AKS, and we enforced a server timeout using something like

@kopf.on.startup()
def configure(settings: kopf.OperatorSettings, **_):
  settings.watching.server_timeout = 210

give it a try.

ghost avatar Oct 10 '24 16:10 ghost

Same symptoms, AKS, Kubernetes version 1.29

caffeinism avatar Nov 04 '24 10:11 caffeinism

We evaluated changing the timeouts now for a month - for us it really did the trick! Thank you @francescotimperi ! https://kopf.readthedocs.io/en/stable/configuration/#networking-timeouts

Since we set the timeouts in multiple operators we've gotten rid of that particular problem.

SteinRobert avatar Nov 04 '24 12:11 SteinRobert