airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Some liveness checks don't actually check process

Open nsAstro opened this issue 3 years ago • 1 comments

Description

Current liveness check probes use the 'airflow jobs' command which directly queries the backend DB as opposed to actually querying an endpoint or checking the status of the process itself.

e.g. Triggerer liveness probe

exec [sh -c CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --job-type TriggererJob --hostname $(hostname)] delay=10s timeout=20s period=60s #success=1 #failure=5

This command only checks the backend DB to see if there are any jobs. Additionally, the exit code is always 0 regardless of how many jobs there are. Ideally, the liveness check is done by querying some endpoint on the triggerer to see if it's still running.

Use case/motivation

Would like a liveness check that is more aware of the process rather than the stored state

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

nsAstro avatar Sep 21 '22 20:09 nsAstro

Thanks for opening your first issue here! Be sure to follow the issue template!

boring-cyborg[bot] avatar Sep 21 '22 20:09 boring-cyborg[bot]

Marked it as good first issue, this is a good idea, and hopefully someone might improve it. BTW. @nsAstro - If you have ideas how to improve - you are most welcome to make PR #- this is an easy way to become one of the ~ 2200 contributors. Otherwise it will just have to wait for someone to pick it up.

potiuk avatar Sep 22 '22 14:09 potiuk

@potiuk, I would like to take this task. Can you please assign it to me?

TruptiM18 avatar Sep 29 '22 16:09 TruptiM18

@potiuk, Can I take this task?

TruptiM18 avatar Oct 06 '22 06:10 TruptiM18

@uranusjr, Thanks! I have started working on it.

TruptiM18 avatar Oct 10 '22 07:10 TruptiM18

@uranusjr @TruptiM18 if you aren't working on this, I would like to try my hand at it. I see that the liveness checks can be improved for the scheduler and the triggerer here. Would like to hear what kind of liveness probe we would rather prefer. Calling an endpoint instead? Do we have ping for triggerer?

amoghrajesh avatar May 26 '23 05:05 amoghrajesh

Please feel free.

uranusjr avatar May 26 '23 07:05 uranusjr

Thanks. Any hints or clues on what we are looking to have as new liveness probes? @uranusjr

amoghrajesh avatar May 26 '23 08:05 amoghrajesh