airflow
airflow copied to clipboard
KubernetesPodOperator duplicating logs when interrupted
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.8.1
What happened?
The KubernetesPodOperator is duplicating tasks's logs two times when log read interrupted but container base still running
they are interrupted. Happens randomly on different dags and different runs of the same dag. Assume it is somehow connected to the https://github.com/apache/airflow/issues/35019
What you think should happen instead?
no logs duplicate
How to reproduce
KubernetesPodOperator on cloud AKS cluster
Operating System
Ubuntu 22.04
Versions of Apache Airflow Providers
apache-airflow==2.8.1 apache-airflow-providers-amazon==8.16.0 apache-airflow-providers-celery==3.5.1 apache-airflow-providers-cncf-kubernetes==7.13.0 apache-airflow-providers-common-io==1.2.0 apache-airflow-providers-common-sql==1.10.0 apache-airflow-providers-docker==3.9.1 apache-airflow-providers-elasticsearch==5.3.1 apache-airflow-providers-ftp==3.7.0 apache-airflow-providers-google==10.13.1 apache-airflow-providers-grpc==3.4.1 apache-airflow-providers-hashicorp==3.6.1 apache-airflow-providers-http==4.8.0 apache-airflow-providers-imap==3.5.0 apache-airflow-providers-microsoft-azure==8.5.1 apache-airflow-providers-mysql==5.5.1 apache-airflow-providers-odbc==4.4.0 apache-airflow-providers-openlineage==1.4.0 apache-airflow-providers-postgres==5.10.0 apache-airflow-providers-redis==3.6.0 apache-airflow-providers-sendgrid==3.4.0 apache-airflow-providers-sftp==4.8.1 apache-airflow-providers-slack==8.5.1 apache-airflow-providers-snowflake==5.2.1 apache-airflow-providers-sqlite==3.7.0 apache-airflow-providers-ssh==3.10.0 google-cloud-orchestration-airflow==1.10.0
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
could you try the latest version 8.1.1 of apache-airflow-providers-cncf-kubernetes
Related https://github.com/apache/airflow/issues/33498
@raphaelauv
with version 8.1.1 the problem is still present. It seems that now is allways getting "Pod docker-java-w2ade41b log read interrupted but container base still running"
Airflow's version:
airflow@airflow-test-worker-6cb8744f69-sw7xg:/opt/airflow$ airflow version
2.9.0
airflow@airflow-test-worker-6cb8744f69-sw7xg:/opt/airflow$ pip list | grep kub
apache-airflow-providers-cncf-kubernetes 8.1.1
kubernetes 29.0.0
kubernetes_asyncio 29.0.0
Some work around it was done https://github.com/apache/airflow/issues/33498 cc @fdemiane maybe you will have time to take a look?
If we actually look at the logs, the logs that have been duplicated are within one second. If we look at the code here, we see that read_pod_logs take since_seconds which is in seconds, and is passed to _client.read_namespaced_pod_logs (docs here) which does not support a finer grained time representation.
Also looking at the Kubernetes API reference, it doesn't seem to support passing a finer-grained time representation. kubctl seem to support passing a since_time which allows passing a timestamp which supports milliseconds as seen here.
Doing a little search, I found this issue here in the distant past.
The optimal fix for this issue to to provide a way to support passing a since_time in the kubernetes client (out of scope of Airflow), then do the necessary code changes in the KPO. A quick win would be to add a warning message that logs within one second might get duplicated (maybe here?).
I opened a pull request, but I am not really sure if this is the correct way to go, as this is a rare occurrence, and logs might get polluted (space consumed is minimal, but still). What do you think? (CC: @eladkal)