kubernetes_asyncio icon indicating copy to clipboard operation
kubernetes_asyncio copied to clipboard

ClientPayloadError occasionally thrown when iterating a watch

Open jonathon-love opened this issue 2 years ago • 7 comments

i do get a ClientPayloadError from time to time:

File ... in run
    async for update in stream:
  File "/usr/local/lib/python3.8/dist-packages/kubernetes_asyncio/watch/watch.py", line 131, in __anext__
    return await self.next()
  File "/usr/local/lib/python3.8/dist-packages/kubernetes_asyncio/watch/watch.py", line 152, in next
    line = await self.resp.content.readline()
  File "/usr/local/lib/python3.8/dist-packages/aiohttp/streams.py", line 338, in readline
    await self._wait("readline")
  File "/usr/local/lib/python3.8/dist-packages/aiohttp/streams.py", line 306, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

cheers

jonathon-love avatar Aug 28 '21 08:08 jonathon-love

Is it self-hosted cluster or provided by Google/AWS/Azure? Could you check if kubectl works longer without the issue?

tomplus avatar Aug 28 '21 21:08 tomplus

this is on a self-hosted microk8s cluster. are you suggesting i try running the same watch with kubectl? i didn't know kubectl could perform watches. could you point me to some docs if so?

with thanks

jonathon-love avatar Aug 28 '21 23:08 jonathon-love

There is a flag --watch to watch for changes, for example:

`kubectl get pods --watch`

tomplus avatar Aug 30 '21 06:08 tomplus

righto, i'll take a look. regardless, should the aiohttp.client_exceptions.ClientPayloadError be wrapped? (i'm not sure, i can see arguments both ways)

jonathon-love avatar Aug 30 '21 06:08 jonathon-love

How long is it working without raising the exception?

I'm also not sure what to do... we can treat it as "timeout" and silently reconnect but on the other hand such behavior may hide some errors in other case.

tomplus avatar Aug 30 '21 21:08 tomplus

Could you check if kubectl works longer without the issue?

i haven't tested exhaustively, but it doesn't look like kubectl lasts longer.

How long is it working without raising the exception?

it works for quite some time. i think this might typically result in about 4 pod restarts in 24 hours.

I'm also not sure what to do... we can treat it as "timeout" and silently reconnect but on the other hand such behavior may hide some errors in other case.

yeah. ideally we can clear tell between exceptions we expect, and those we don't. something like:

while True:
    try:
        async with watch.stream(self._api.list_namespaced_pod, self._namespace) as stream:
                async for update in stream:
    except ExceptionsWeExpect:
        pass
    except ExceptionsWeDontExpect:
        raise

with thanks

jonathon-love avatar Sep 17 '21 04:09 jonathon-love

We are seeing this issue in an Azure kubernetes 1.22 cluster. Whenever we watch anything and there is no activity for the 5 minute default kubernetes timeout (even if prior events have been received), we see this error.

I am not familiar (enough) with the underpinnings of how kubernetes does watch calls, but I could run a kubectl get pods --watch on the same cluster and it worked past the 5 minute timeout.

cpnielsen avatar Jul 18 '22 12:07 cpnielsen