airflow
airflow copied to clipboard
Some liveness checks don't actually check process
Description
Current liveness check probes use the 'airflow jobs' command which directly queries the backend DB as opposed to actually querying an endpoint or checking the status of the process itself.
e.g. Triggerer liveness probe
exec [sh -c CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --job-type TriggererJob --hostname $(hostname)] delay=10s timeout=20s period=60s #success=1 #failure=5
This command only checks the backend DB to see if there are any jobs. Additionally, the exit code is always 0 regardless of how many jobs there are. Ideally, the liveness check is done by querying some endpoint on the triggerer to see if it's still running.
Use case/motivation
Would like a liveness check that is more aware of the process rather than the stored state
Related issues
No response
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template!
Marked it as good first issue, this is a good idea, and hopefully someone might improve it. BTW. @nsAstro - If you have ideas how to improve - you are most welcome to make PR #- this is an easy way to become one of the ~ 2200 contributors. Otherwise it will just have to wait for someone to pick it up.
@potiuk, I would like to take this task. Can you please assign it to me?
@potiuk, Can I take this task?
@uranusjr, Thanks! I have started working on it.
@uranusjr @TruptiM18 if you aren't working on this, I would like to try my hand at it. I see that the liveness checks can be improved for the scheduler and the triggerer here. Would like to hear what kind of liveness probe we would rather prefer. Calling an endpoint instead? Do we have ping for triggerer?
Please feel free.
Thanks. Any hints or clues on what we are looking to have as new liveness probes? @uranusjr