airflow icon indicating copy to clipboard operation
airflow copied to clipboard

KubernetesPodOperator duplicating logs when interrupted

Open Nikita-Sobolev opened this issue 10 months ago • 4 comments

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.8.1

What happened?

The KubernetesPodOperator is duplicating tasks's logs two times when log read interrupted but container base still running they are interrupted. Happens randomly on different dags and different runs of the same dag. Assume it is somehow connected to the https://github.com/apache/airflow/issues/35019

What you think should happen instead?

no logs duplicate

How to reproduce

KubernetesPodOperator on cloud AKS cluster

Operating System

Ubuntu 22.04

Versions of Apache Airflow Providers

apache-airflow==2.8.1 apache-airflow-providers-amazon==8.16.0 apache-airflow-providers-celery==3.5.1 apache-airflow-providers-cncf-kubernetes==7.13.0 apache-airflow-providers-common-io==1.2.0 apache-airflow-providers-common-sql==1.10.0 apache-airflow-providers-docker==3.9.1 apache-airflow-providers-elasticsearch==5.3.1 apache-airflow-providers-ftp==3.7.0 apache-airflow-providers-google==10.13.1 apache-airflow-providers-grpc==3.4.1 apache-airflow-providers-hashicorp==3.6.1 apache-airflow-providers-http==4.8.0 apache-airflow-providers-imap==3.5.0 apache-airflow-providers-microsoft-azure==8.5.1 apache-airflow-providers-mysql==5.5.1 apache-airflow-providers-odbc==4.4.0 apache-airflow-providers-openlineage==1.4.0 apache-airflow-providers-postgres==5.10.0 apache-airflow-providers-redis==3.6.0 apache-airflow-providers-sendgrid==3.4.0 apache-airflow-providers-sftp==4.8.1 apache-airflow-providers-slack==8.5.1 apache-airflow-providers-snowflake==5.2.1 apache-airflow-providers-sqlite==3.7.0 apache-airflow-providers-ssh==3.10.0 google-cloud-orchestration-airflow==1.10.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

Untitled

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

Nikita-Sobolev avatar Apr 24 '24 14:04 Nikita-Sobolev

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar Apr 24 '24 14:04 boring-cyborg[bot]

could you try the latest version 8.1.1 of apache-airflow-providers-cncf-kubernetes

raphaelauv avatar Apr 24 '24 15:04 raphaelauv

Related https://github.com/apache/airflow/issues/33498

tirkarthi avatar Apr 24 '24 20:04 tirkarthi

@raphaelauv

with version 8.1.1 the problem is still present. It seems that now is allways getting "Pod docker-java-w2ade41b log read interrupted but container base still running"

Airflow's version:

airflow@airflow-test-worker-6cb8744f69-sw7xg:/opt/airflow$ airflow version
2.9.0

airflow@airflow-test-worker-6cb8744f69-sw7xg:/opt/airflow$ pip list | grep kub
apache-airflow-providers-cncf-kubernetes 8.1.1
kubernetes                               29.0.0
kubernetes_asyncio                       29.0.0

gbonazzoli avatar Apr 27 '24 09:04 gbonazzoli

Some work around it was done https://github.com/apache/airflow/issues/33498 cc @fdemiane maybe you will have time to take a look?

eladkal avatar May 26 '24 08:05 eladkal

If we actually look at the logs, the logs that have been duplicated are within one second. If we look at the code here, we see that read_pod_logs take since_seconds which is in seconds, and is passed to _client.read_namespaced_pod_logs (docs here) which does not support a finer grained time representation.

Also looking at the Kubernetes API reference, it doesn't seem to support passing a finer-grained time representation. kubctl seem to support passing a since_time which allows passing a timestamp which supports milliseconds as seen here.

Doing a little search, I found this issue here in the distant past.

The optimal fix for this issue to to provide a way to support passing a since_time in the kubernetes client (out of scope of Airflow), then do the necessary code changes in the KPO. A quick win would be to add a warning message that logs within one second might get duplicated (maybe here?).

fdemiane avatar May 26 '24 20:05 fdemiane

I opened a pull request, but I am not really sure if this is the correct way to go, as this is a rare occurrence, and logs might get polluted (space consumed is minimal, but still). What do you think? (CC: @eladkal)

fdemiane avatar May 26 '24 21:05 fdemiane