airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Support for Kubernetes Executor side task attempt logging for failed tasks in case of task pods doesn't reach running state

Open dirrao opened this issue 1 year ago • 7 comments

Right now, when the tasks fail due to pod launch failures or the pod is stuck in the pending phase, then the task logs from the UI are empty. It is very inconvenient for airflow consumers to debug it. They might not have access to the scheduler logs. We can push these failure reasons from the Kubernetes executor to task logs. So, that airflow consumers can able to see task failure reasons from the UI.

closes: #37435

dirrao avatar Sep 23 '24 12:09 dirrao

@jedcunningham / @hussein-awala Can you review this MR when you are free?

dirrao avatar Oct 04 '24 10:10 dirrao

@romsharon98 Can you review it when you are free?

dirrao avatar Oct 10 '24 15:10 dirrao

@jedcunningham / @hussein-awala Can you review this MR when you are free?

dirrao avatar Oct 15 '24 11:10 dirrao

@dstandish With the changes introduced in PR #43183, is it still feasible to generate logs from locations other than the workers? If so, could you please provide a few references to explain this?

dirrao avatar Oct 19 '24 04:10 dirrao

@dstandish With the changes introduced in PR #43183, is it still feasible to generate logs from locations other than the workers? If so, could you please provide a few references to explain this?

Yes, you add a log record. Was done recently in an aws executor

dstandish avatar Oct 19 '24 06:10 dstandish

@dstandish With the changes introduced in PR #43183, is it still feasible to generate logs from locations other than the workers? If so, could you please provide a few references to explain this?

Ok following up with more specifics this morning.

Look at BaseExecutor.log_task_event

executors do not have access to a session, so from executor when we need to send this kind of message, we write the log records to a queue, and the scheduler consumes this queue and writes to db.

cc @potiuk

dstandish avatar Oct 19 '24 13:10 dstandish

Look at BaseExecutor.log_task_event

Nice.

potiuk avatar Oct 20 '24 16:10 potiuk

@dirrao what is the status of this PR?

eladkal avatar Nov 05 '24 12:11 eladkal

@dirrao what is the status of this PR?

I’ve implemented the changes on the Kubernetes executor to align with the new base executor feature. However, I’m uncertain about how to adapt these updates in the Kubernetes watcher code.

dirrao avatar Nov 06 '24 06:11 dirrao

@dirrao why do you need to push anything to the watcher here? You writing logs to the task logs should be sufficient right? or am missing something

amoghrajesh avatar Dec 16 '24 11:12 amoghrajesh

TaskContextLogger is removed from airflow; this PR needs to be updated to use the log table. The feature was added in this pr https://github.com/apache/airflow/pull/40867 and you can see an example of its usage there.

dstandish avatar Dec 16 '24 22:12 dstandish

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Feb 10 '25 00:02 github-actions[bot]