airflow
airflow copied to clipboard
InvalidChunkLength(got length b'', 0 bytes read)) issue in airflow scheduler which is using kubernetes executor
Apache Airflow Provider(s)
cncf-kubernetes
Versions of Apache Airflow Providers
apache-airflow-providers-cncf-kubernetes==8.0.1 kubernetes==29.0.0 kubernetes_asyncio==29.0.0
kubernetes version: Client Version: v1.28.0 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.28.9
Apache Airflow version
2.8.3
Operating System
Linux
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
What happened
I have deployed the airflow using official helm chart in my cluster. everything seems to be working fine, but when the scheduler is idle it gives this error:
ERROR - Unknown error in KubernetesJobWatcher. Failing
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 761, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 444, in _error_catcher
yield
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 828, in read_chunked
self._update_chunk_length()
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 765, in _update_chunk_length
raise InvalidChunkLength(self, line)
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py", line 112, in run
self.resource_version = self._run(
File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py", line 168, in _run
for event in self._pod_events(kube_client=kube_client, query_kwargs=kwargs):
File "/usr/local/lib/python3.9/site-packages/kubernetes/watch/watch.py", line 178, in stream
for line in iter_resp_lines(resp):
File "/usr/local/lib/python3.9/site-packages/kubernetes/watch/watch.py", line 56, in iter_resp_lines
for segment in resp.stream(amt=None, decode_content=False):
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 624, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 857, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 461, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
Process KubernetesJobWatcher-7:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 761, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 444, in _error_catcher
yield
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 828, in read_chunked
self._update_chunk_length()
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 765, in _update_chunk_length
raise InvalidChunkLength(self, line)
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py", line 112, in run
self.resource_version = self._run(
File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py", line 168, in _run
for event in self._pod_events(kube_client=kube_client, query_kwargs=kwargs):
File "/usr/local/lib/python3.9/site-packages/kubernetes/watch/watch.py", line 178, in stream
for line in iter_resp_lines(resp):
File "/usr/local/lib/python3.9/site-packages/kubernetes/watch/watch.py", line 56, in iter_resp_lines
for segment in resp.stream(amt=None, decode_content=False):
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 624, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 857, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 461, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
The scheduler works normally but this error is continuously generated. The same chart doesn't give any error in other cluster. What might be the issue.
P.S.: I have read this issue : https://github.com/apache/airflow/issues/33066 but didnt get any conclusion on this.
What you think should happen instead
The scheduler should not give any error. The kubewatcher seems to be the core component for this issue.
How to reproduce
Use official helm chart 1.13.1 with these configs:
airflowVersion: 2.8.3
# Ingress configuration
ingress:
enabled: false
# Enable web ingress resource
web:
enabled: True
# Annotations for the web Ingress
config:
core:
parallelism: 32
max_active_tasks_per_dag: 16
max_active_runs_per_dag: 16
dagbag_import_timeout: 100
dag_file_processor_timeout: 50
min_serialized_dag_update_interval: 60
min_serialized_dag_fetch_interval: 30
pgbouncer:
# Enable PgBouncer
enabled: true
scheduler:
replicas: 1
resources:
limits:
cpu: 1500m
memory: 1500Mi
requests:
cpu: 1000m
memory: 1200Mi
livenessProbe:
initialDelaySeconds: 120
timeoutSeconds: 30
failureThreshold: 5
periodSeconds: 300
command: ~
webserver:
replicas: 1
resources:
limits:
cpu: 1000m
memory: 1500Mi
requests:
cpu: 500m
memory: 1200Mi
livenessProbe:
initialDelaySeconds: 15
timeoutSeconds: 30
failureThreshold: 20
periodSeconds: 5
readinessProbe:
initialDelaySeconds: 15
timeoutSeconds: 30
failureThreshold: 20
periodSeconds: 5
webserverConfig: |
from flask_appbuilder.security.manager import AUTH_DB
# use embedded DB for auth
AUTH_TYPE = AUTH_DB
dags:
persistence:
enabled: true
size: 10Gi
accessMode: ReadWriteMany
logs:
persistence:
enabled: true
size: 10Gi
postgresql:
enabled: True
Anything else
This occurs continuously whenever the airflow scheduler is idle (it is not running any dags)
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Hello, does anyone know how to handle such error? Have the same problem.
The issue is with cncf-provider-kubernetes. update it to 8.4 version. The issue will be solved