airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Wait for xcom sidecar container to start before sidecar exec

Open ryantam626 opened this issue 3 years ago • 12 comments

According to k8s's docs on pod phase [1], "Running" state means "The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting.".

This means there is no guarantee that the xcom sidecar container will have been running by the time the extract_xcom is used by KPO's execute even if we wait for pod to be a "Running" state - this further means if the base container has completed before the xcom sidecar container is running, the entire operator fails because you cannot exec against a non-running container.

Fix is to wait for the xcom sidecar container to be in the running state before we exec against it, which this PR implements.

[1] - https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase [2] - https://github.com/schattian/airflow/blob/86171ff43f8977021708cfa241da64f96c4c15e2/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py#L398

ryantam626 avatar Jul 14 '22 11:07 ryantam626

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style. Apache Airflow is a community-driven project and together we are making it better 🚀. In case of doubts contact the developers at: Mailing List: [email protected] Slack: https://s.apache.org/airflow-slack

boring-cyborg[bot] avatar Jul 14 '22 11:07 boring-cyborg[bot]

Not exactly sure how I can properly introduce a failing test and then make it pass here, would like some guidance on this.

ryantam626 avatar Jul 14 '22 11:07 ryantam626

Use mocking. see similar tests in test_pod_manager.py

potiuk avatar Jul 14 '22 11:07 potiuk

The test I was thinking was more along the lines of actually having a k8s cluster that can reproduce the behaviour of xcom container not started when base container has completed, and check that xcom extraction was successful.

Anyway, I have added the mock-based unit test that just asserts the newly added await_xcom_sidecar_container_start has been called once and exactly once.

ryantam626 avatar Jul 14 '22 13:07 ryantam626

LGTM. @jedcunningham @dstandish WDYT?

potiuk avatar Jul 16 '22 07:07 potiuk

The test I was thinking was more along the lines of actually having a k8s cluster that can reproduce the behaviour of xcom container not started when base container has completed, and check that xcom extraction was successful.

Running them on "integration" level is expensive and we alredy have some of those tests that take far too much time of our CI :)

potiuk avatar Jul 18 '22 16:07 potiuk

Anyone else ?

potiuk avatar Jul 18 '22 16:07 potiuk

i'll have a look

dstandish avatar Jul 18 '22 16:07 dstandish

@ryantam626 are you still working on this PR?

eladkal avatar Aug 09 '22 15:08 eladkal

@eladkal Oh sorry, I wasn't aware the WDYTs were directed at me.

I will work on the changes @dstandish suggested now.

ryantam626 avatar Aug 09 '22 16:08 ryantam626

Static checks failed :(

eladkal avatar Aug 09 '22 17:08 eladkal

@eladkal all green now, sorry for the wait.

And thanks to whoever approved the start of CI for me :pray:

ryantam626 avatar Aug 10 '22 17:08 ryantam626

This looks cool - @dstandish you might take a look when you are back but this one looks rather straightforward. Merging.

potiuk avatar Aug 21 '22 22:08 potiuk

Awesome work, congrats on your first merged pull request!

boring-cyborg[bot] avatar Aug 21 '22 22:08 boring-cyborg[bot]