kopf icon indicating copy to clipboard operation
kopf copied to clipboard

Kopf Crashes with "Connection broken: IncompleteRead(0 bytes read)"

Open chilicat opened this issue 5 years ago • 7 comments

Expected Behavior

Simple handler should not crash

Actual Behavior

Kopf crashes after a couple of minutes

`

Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 639, in _update_chunk_length self.chunk_left = int(line, 16) ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 397, in _error_catcher yield File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 704, in read_chunked self._update_chunk_length() File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 643, in _update_chunk_length raise httplib.IncompleteRead(line) http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 750, in generate for chunk in self.raw.stream(chunk_size, decode_content=True): File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 527, in stream for line in self.read_chunked(amt, decode_content=decode_content): File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 732, in read_chunked self._original_response.close() File "/usr/local/lib/python3.7/contextlib.py", line 130, in exit self.gen.throw(type, value, traceback) File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 415, in _error_catcher raise ProtocolError('Connection broken: %r' % e, e) urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/kopf", line 10, in sys.exit(main()) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 764, in call return self.main(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.7/site-packages/click/core.py", line 555, in invoke return callback(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/kopf/cli.py", line 30, in wrapper return fn(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/kopf/cli.py", line 61, in run peering_name=peering_name, File "/usr/local/lib/python3.7/site-packages/kopf/reactor/queueing.py", line 275, in run _reraise(loop, list(done1) + list(done2) + list(done3) + list(done4)) File "/usr/local/lib/python3.7/site-packages/kopf/reactor/queueing.py", line 303, in _reraise task.result() # can raise the regular (non-cancellation) exceptions. File "/usr/local/lib/python3.7/site-packages/kopf/reactor/queueing.py", line 81, in watcher async for event in watching.infinite_watch(resource=resource, namespace=namespace): File "/usr/local/lib/python3.7/site-packages/kopf/clients/watching.py", line 131, in infinite_watch async for event in streaming_watch(resource=resource, namespace=namespace): File "/usr/local/lib/python3.7/site-packages/kopf/clients/watching.py", line 93, in streaming_watch async for event in streaming_aiter(stream, loop=loop): File "/usr/local/lib/python3.7/site-packages/kopf/clients/watching.py", line 62, in streaming_aiter yield await loop.run_in_executor(executor, streaming_next, src) File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/usr/local/lib/python3.7/site-packages/kopf/clients/watching.py", line 50, in streaming_next return next(src) File "/usr/local/lib/python3.7/site-packages/kopf/clients/fetching.py", line 87, in return iter({'type': event.type, 'object': event.object.obj} for event in src) File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 214, in object_stream for line in r.iter_lines(): File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 794, in iter_lines for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode): File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 753, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

`

Steps to Reproduce the Problem

Start Kopf with a following handler:

` import kopf import yaml import os

@kopf.on.event('', 'v1', 'pods', labels= {"type": "mongod"}) def pod_changed(logger, body, **kwargs): logger.info(f"Pod: %s", body['metadata']['name']) pass ` Kopf crashes in around 5 minutes

Specifications

  • Platform: Docker container: python:3.7.3-alpine3.9

  • Kubernetes version: (use kubectl version)

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:26:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

  • Python version: (use python --version)

Python 3.7.3

  • Python packages installed: (use pip freeze --all) aiohttp==3.5.4 aiojobs==0.2.2 async-timeout==3.0.1 attrs==19.1.0 cachetools==3.1.1 certifi==2019.6.16 chardet==3.0.4 Click==7.0 google-auth==1.6.3 idna==2.8 iso8601==0.1.12 kopf==0.20 kubernetes==10.0.0 multidict==4.5.2 oauthlib==3.0.2 pip==19.1.1 pyasn1==0.4.6 pyasn1-modules==0.2.6 pykube-ng==0.28 python-dateutil==2.8.0 PyYAML==5.1.2 requests==2.22.0 requests-oauthlib==1.2.0 rsa==4.0 setuptools==41.0.1 six==1.12.0 urllib3==1.25.3 websocket-client==0.56.0 wheel==0.33.3 yarl==1.3.0

chilicat avatar Aug 06 '19 10:08 chilicat

@chilicat Thanks for reporting.

Can you please make an experiment in your environment: if you put this line on top of your script, does the delayed error happen exactly by that specified time (in seconds)? If you set it to 600 seconds (10 mins), does it still happen at ~5 mins?

import kopf

kopf.config.WatchersConfig.default_stream_timeout = 60

@kopf.on.event(...)
...

nolar avatar Aug 06 '19 10:08 nolar

The process does not crash anymore (~50 minutes, still running)

chilicat avatar Aug 06 '19 12:08 chilicat

@chilicat Thanks. So, let it be a workaround for now (despite that kopf.config... is undocumented and internal). Please, wrap it with try-except — in case this module/class/attribute is renamed/removed in the future.

I saw this issue few times — with sporadic server-side disconnections when ?timeout=... query arg is not specified. It goes deep into K8s API implementation and Python's internals: kopf→pykube→requests→urlib3→http→socket.

I would prefer to not fix the sync i/o issues in this async app anymore (too many, too hard), and would better replace all of this with aiohttp as the core of Kopf's i/o (coming soon) — and then fix the connection issues there (if they happen).

So, let's keep this issue open until then — so that the issue is not forgotten, and a fix is added.

nolar avatar Aug 06 '19 13:08 nolar

@nolar Sure, no problem. Thanks for the fast feedback.

chilicat avatar Aug 06 '19 13:08 chilicat

I am experiencing the same issue with my PoC of operator for Vertica cluster DB. Luckily the workaround works for me as well. Subscribing.

jaceksan avatar Oct 06 '19 19:10 jaceksan

@nolar while playing with my operator, the error started to occur more and more often, it is almost impossible to continue developing it. Unfortunately the workaround stopped working. Is delivery of the replacement with aiohttp already planned? Is there any other way, how to workaround the issue?

jaceksan avatar Oct 08 '19 06:10 jaceksan

kopf==0.23rc1 is now pre-released (see the release notes). It is now fully aiohttp-based, and contains no synchronous API calls. Which means, the whole I/O machinery is changed. Which means, the described issue is either completely gone, or will look differently.

@chilicat @jaceksan Please, give this release candidate a try — is the reported issue gone (with a workaround removed temporarily)? I could not reproduce it in any of my environments.

nolar avatar Nov 13 '19 17:11 nolar