airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Improve KubernetesExecutor Observability

Open dengpenn opened this issue 1 year ago • 3 comments

Description

During our adoption of Airflow, the scheduler might create hundreds of pods during main scheduling loop. I propose to add two kind of metrics: the response code of k8s client and latency of creating/patching/deleting the pod.

Use case/motivation

Airflow executor create one pod for each individual task. During peak time, we saw 800+ tasks were scheduled and the latency of underlying K8s API increased. The executor's heartbeat might be delayed due to the creation of task pods, potentially affecting the scheduler's heartbeat. It will be good to have metrics to monitor the response code and the latency of k8s API for creating/patching/deleting the pod.

Related issues

N/A

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

dengpenn avatar Apr 24 '24 04:04 dengpenn

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar Apr 24 '24 04:04 boring-cyborg[bot]

What is the worker pods creation batch size? This limits the number of pods created during a given scheduler loop

RNHTTR avatar May 07 '24 21:05 RNHTTR

We do have two metrics for the same. Can you check?

kubernetes_executor.clear_not_launched_queued_tasks.duration
kubernetes_executor.adopt_task_instances.duration

https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html

dirrao avatar May 11 '24 04:05 dirrao

This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

github-actions[bot] avatar May 26 '24 00:05 github-actions[bot]

This issue has been closed because it has not received response from the issue author.

github-actions[bot] avatar Jun 02 '24 00:06 github-actions[bot]