python list_pod_for_all_namespaces urllib3.exceptions.ProtocolError: Connection broken: InvalidChunkLength

What happened (please include outputs or screenshots):

Sometimes code failed on:

for pod_event in self._watcher.stream(func=self._core_api.list_pod_for_all_namespaces, **watch_kwargs):

Getting this exception:

urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

What you expected to happen: Not failed with exception. just getting valid chunk or if there is no more pods finish the loop

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment: it something happen sometimes to our clients working with k8s version: 1.20+, 1.21+, 1.25+ ...

Kubernetes version (kubectl version): -kubectl version WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version. Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"darwin/arm64"} Kustomize Version: v4.5.7 Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.12-gke.1000", GitCommit:"b4e24aa2edb70ab31433ba75bd4052409d858719", GitTreeState:"clean", BuildDate:"2023-03-30T09:32:49Z", GoVersion:"go1.19.7 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
OS (e.g., MacOS 10.13.6): MacOS 13.3.1 (a) (22E772610a)
Python version (python --version): 3.9.6
Python client version (pip list | grep kubernetes) 23.6.0

Jun 28 '23 06:06 adi-epshtain

Hey @adi-epshtain Please check your network connection and upgarde the urllib3 module, did you done that?

Jul 01 '23 02:07 ai-naymul

Getting this exact same exception with latest version of urllib3 and this code:

pod_stream = self.watcher.stream(kube_client.list_namespaced_pod, namespace, label_selector=pod_labels, resource_version=pod_resource_version)

for event in pod_stream:
  ...

In our case this happens in AKS every 4min 10s when there is no activity on the connection. AKS sends a RST packet to the client (as seen from this tcpdump) after that time:

15:18:15.152289 IP xxx.xxx.xxx.xxx.https > yyy.yyy.yyy.yyy.60860: Flags [R.], seq 61930, ack 4250, win 0, length 0

It seems urllib3 / k8s python client does not handle this case. More details in this issue, which mentions that the TCP keepalive should be used. I managed to overcome this issue by using a 4min timeout on the watcher, tho it would be nice if the library handled this out of the box.

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.8"
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.10"

kubernetes==26.1.0
urllib3==2.1.0

Dec 07 '23 13:12 tomi

same here with kkubernetes==28.1.0 and urllib3==2.1.0. in OpenShift, we don't often see those / first time I do today. Still annoying, despite setting a timeout_seconds=3600 in my watch/stream, as of that "InvalidChunkLength" log, looks like my script is done looping/re-opening watches ... I'm no longer receiving events. script didn't exit, no exception caught, ... looks dead.

Dec 11 '23 19:12 faust64

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Mar 10 '24 20:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Apr 09 '24 20:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

May 09 '24 21:05 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

May 09 '24 21:05 k8s-ci-robot