airflow icon indicating copy to clipboard operation
airflow copied to clipboard

KubernetesPodOperator with multiple containers hangs if container other than base container is still running

Open jonathan-ostrander opened this issue 1 year ago • 3 comments
trafficstars

Apache Airflow version

main (development)

If "Other Airflow 2 version" selected, which one?

No response

What happened?

A KubernetesPodOperator with the following full_pod_spec:

apiVersion: v1
kind: Pod
metadata:
  name: multi-container-pod
spec:
  restartPolicy: Never
  containers:
  - name: base
    image: busybox
    command: ["sh", "-c", "echo base will exit after 30 seconds; sleep 30"]
  - name: sidecar
    image: busybox
    command: ["sh", "-c", "echo sidecar running indefinitely; while true; do sleep 3600; done"]

will not mark the task as successful after 30 seconds because the sidecar will continue to run after the base container has succeeded. This happens because the pod_manager gets stuck waiting for pod completion. This if statement returns False when istio is not enabled on the pod.

What you think should happen instead?

The pod should be considered complete when the base container succeeds regardless of whether or not any other containers on the pod are still running.

How to reproduce

Create a KubernetesPodOperator with the full_pod_spec provided.

Operating System

MacOS 14.4.1

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==8.0.1

Deployment

Other

Deployment details

Kubernetes on Google Kubernetes Engine. Kubernetes executors and worker pods all run on the same cluster.

Anything else?

No response

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

jonathan-ostrander avatar May 17 '24 16:05 jonathan-ostrander

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar May 17 '24 16:05 boring-cyborg[bot]

Feel free to fix this behaiviour

Taragolis avatar May 17 '24 17:05 Taragolis

related: #39625

dirrao avatar May 19 '24 05:05 dirrao

FWIW, if someones stumbles on the same issue, one alternative way to solve this issue is to deploy the Sidecar shutdown controller, which detects when a pod with multiple containers has its "main" container complete, and will exec into the remaining sidecar containers to kill their main PID.

brouberol avatar Mar 14 '25 07:03 brouberol