airflow icon indicating copy to clipboard operation
airflow copied to clipboard

KubernetesPodOperator does not have any logs

Open slaupster opened this issue 4 years ago • 7 comments

Apache Airflow version: 2.0.2

Kubernetes version (if you are using kubernetes) (use kubectl version):

...
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): linux
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

With DEBUG logging on, when using a simple hello world KubernetesPodOperator no logs are produced in the executor either when there is a problem with the setup or when everything is working.

What you expected to happen: Executor has logs showing the error or successful invocation.

There are no logs at all. If I force it to be interactive by changing https://github.com/apache/airflow/blob/476d0f6e3d2059f56532cda36cdc51aa86bafb37/airflow/cli/commands/task_command.py#L236 (comment out the with clause and just invoke _run_task_by_selected_method as is done for interactive) I get logs in the executor pod. Whatever _capture_task_logs is trying to do it seems to hinder rather than help

How to reproduce it: With helm chart version 8.1.1 and airflow 2.0.2 built from source, and logging config:

[logging]
    colored_console_log = False
    logging_level = DEBUG
    remote_logging = False
    donot_modify_handlers = True

run a DAG that uses the from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator operator. Note that there is no logging in the executor. Modify task_command.py to call _run_task_by_selected_method without try _capture_task_logs and see that the operator has a lot of useful logging including the pod spec (which is pretty much essential to diagnose any issues).

Anything else we need to know:

slaupster avatar May 24 '21 11:05 slaupster

Thanks for opening your first issue here! Be sure to follow the issue template!

boring-cyborg[bot] avatar May 24 '21 11:05 boring-cyborg[bot]

Hi @slaupster, do you experience this problem with 2.0.1 as well? Based on my experience 2.0.2 and up does not plays well together with Kubernetes

andormarkus avatar May 24 '21 13:05 andormarkus

Hi @slaupster, do you experience this problem with 2.0.1 as well? Based on my experience 2.0.2 and up does not plays well together with Kubernetes

I've only tried 2.0.0 for a PoC and I don't remember having this problem then but I'm not sure I would have needed to see the logs because it just worked. I think change that appears to cause the problem was in 2.0.0 so unless its some other interaction since then I don't see how its ever worked in 2.x

slaupster avatar May 24 '21 13:05 slaupster

Do you see connection with #16001?

andormarkus avatar May 24 '21 16:05 andormarkus

Do you see connection with #16001?

quite possibly. I had an issue with the operator params that was resulting in a bad pod spec and I had to make the hack I did to see any logs at all. I then fixed the problem but have left the hack in so I get logs in stdout/stderr of the KubernetesExecutor. Unlike 16001 I could never see the logs anywhere when I had the problem, possibly because the error cuts that short ? Where are you supposed to see logs from the operators if not in the executor logs - maybe I was not looking in the right place.

slaupster avatar May 24 '21 23:05 slaupster

We are affected by this issue as well slaupster's hack to remove _capture_task_logs() worked around the problem. It makes sense for us to use stdout/stderr for all tasks logs instead of writing them in a bucket: we collect all our infrastructure logs this way

vbarbaresi avatar Aug 19 '21 15:08 vbarbaresi

Hi @slaupster I would like to investigate if I can potentially help in your error case but I am not sure if I fully understood the root cause. We are using K8sPodOperator extensively and have never missed logs so far.

Can you please:

  • Check/do a regression if the same problem persists with "current" version of airflow and Kubernetes/cncf provider package?
  • If yes, can you tell us exact
    • Airflow version
    • Provider package version
    • Kubernetes version
    • Do you use any specific logging back-end configuration or is logging "per standard" using file handlers on shared storage?
  • Can you paste a copy (or the relevant part) of DAG code that is able to re-produce the error
  • Full logs of the task (if there is any line other than no log at all?)

jscheffl avatar Sep 19 '22 17:09 jscheffl

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

github-actions[bot] avatar Oct 25 '22 00:10 github-actions[bot]

This issue has been closed because it has not received response from the issue author.

github-actions[bot] avatar Nov 02 '22 00:11 github-actions[bot]