kopf icon indicating copy to clipboard operation
kopf copied to clipboard

Kopf stops receiving namespace events

Open logicfox opened this issue 4 years ago • 7 comments

Expected Behavior

Kopf should actively receive all namespace events.

@kopf.on.event('', 'v1', 'namespaces')
async def handle_event(event, **kwargs):
    logger = kwargs["logger"]
    logger.debug(f"Event: {event}. Cause: {kwargs.get('cause')}.")

Actual Behavior

Kopf receives events for a while and then stops receiving events. Neither create, updateand delete events handlers are triggered nor do the events show up in the raw event handler.

Steps to Reproduce the Problem

  1. Set up kopf to listen to Namespace events (as shown above)
  2. Log a message when events occur
  3. Create and delete namespaces on a cron (once an hour or so). Notice that Kopf stops receiving events after a period of time.

Specifications

  • Platform: Azure Kubernetes Service
  • Kubernetes version: 1.13.10
  • Python version: python:3.8.0-slim-buster
  • Python packages installed: kopf requests requests_oauthlib parse

logicfox avatar Nov 14 '19 09:11 logicfox

@logicfox Can you please add the Kopf's version too? pip freeze | grep kopf or kopf --version

nolar avatar Nov 14 '19 09:11 nolar

Sure

kopf==0.22

logicfox avatar Nov 14 '19 09:11 logicfox

Maybe a duplicate of #204 #142 (not certain though).

@logicfox Can you please try it with kopf>=0.23rc2? Specifically, kopf==0.23rc1 switches all the I/O internally to asyncio+aiohttp (#227). This already solved some issues with the synchronous sockets freezing in some cases, and maybe solves all the other issues with similar symptoms.

Please, be aware of the massive changes in this RC (see 0.23rc1 & optionally 0.23rc2 release notes) if you have a pre-existing operator, which can be affected — though, in theory, it should be fully backward compatible and safe, but who knows what can break in practice.

nolar avatar Nov 14 '19 11:11 nolar

@nolar Sorry, I couldn't test this earlier. But it looks like the problem is still there in the master branch. watch seems to freeze after a while. I'm going to test this with the raw Kubernetes Python client to see if it's an issue with my cluster.

logicfox avatar Nov 18 '19 21:11 logicfox

We experienced the same issue until we upgraded Kubernetes to 1.15.10 in AKS. In addition I changed the version of Kopf from 0.25 to 0.26.

To the situation before: I noticed that events for CRDs were still being received.

corka149 avatar Apr 29 '20 18:04 corka149

Not sure if this is related,on [email protected] and [email protected] I tried :

@kopf.on.login()
def login_fn(**kwargs):
    # return kopf.login_via_client(**kwargs)
    return kopf.login_via_pykube(**kwargs)

@kopf.on.event('', 'v1', 'namespaces')
# @kopf.on.create('core', 'v1', 'namespaces')

Results in

aiohttp.client_exceptions.ClientResponseError: 403, message='Forbidden', url=URL('{eksUrl}/api/v1/namespaces')

In the same promt, kubectl get namespaces works....

upgraded kopf to 27rc5, got it working with:

@kopf.on.login()
def login_fn(**kwargs):
    return kopf.login_via_client(**kwargs)

@kopf.on.create('', 'v1', 'namespaces')

atamgp avatar May 03 '20 08:05 atamgp

By default there is no timeout on timeoutSeconds for watch session is not set neither in kopf https://github.com/zalando-incubator/kopf/blob/master/kopf/structs/configuration.py#L68 or kubernetes API https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/ as result the session might stuck forewer. setting watching.server_timeout to some value might help here. It is important to set server_timeout to value less than watching.client_timeout (which is aiohttp session global timeout)

@kopf.on.startup() def configure(settings: kopf.OperatorSettings, **_):   settings.watching.server_timeout = 300

I think not only watching session might stuck, as other calls doesn't have default timeout configured. I've proposed to set timeouts globally per aiohttp session https://github.com/zalando-incubator/kopf/pull/377 but looks like it is not possible to override settings in the way propowed in patch, so it have to be updated.

jumpojoy avatar Jul 29 '20 18:07 jumpojoy