awx-operator
awx-operator copied to clipboard
Add liveness/readiness probes to web/task - fixes #414
SUMMARY
Added liveness & readiness probes to the awx-web container.
fixes #414
ISSUE TYPE
- New or Enhanced Feature
ADDITIONAL INFORMATION
Hi @shanemcd how can we advance that PR?
We can consider aslo a liveness for awx-task container?
A command like this awx-manage run_dispatcher --running | grep '\[\]'
return 0 if awx-task work properly on propagation of message and return 1 if there is some issue on comunication beetween task and postgres (for example when there is some connection interruption).
We still discuss it on matrix with @TheRealHaoLiu to also find some solutions to roolback connection with postgre.
@tanganellilore
We can consider aslo a liveness for awx-task container?
A command like this
awx-manage run_dispatcher --running | grep '\[\]'
return 0 if awx-task work properly on propagation of message and return 1 if there is some issue on comunication beetween task and postgres (for example when there is some connection interruption).We still discuss it on matrix with @TheRealHaoLiu to also find some solutions to roolback connection with postgre.
added for task, what do you think about the defaults?
I'm not sure about period, because command require some seconds (like 2 or 3) so i think that for the task we can use something like 10/15 seconds. Let me say, when task container not work, everythings behind UI, will not work, and you can see all tasks in pending (or failing). With 10 seconds and 3 consecutive failure means that after 35/40 seconds container will be restared in case of disconnection with db, so for me should be fine. In any case, users can customize these option on operator side.
To avoid Molecule destroying the environment run:
molecule test --destroy=never
@erz4 from the community meeting
- we would like to see the readiness and liveness probe parameter should be nested under a top level parameter and be hidden
- give ability to disable readiness and liveness probe
i will help troubleshoot the CI failure
@erz4 from the community meeting
- we would like to see the readiness and liveness probe parameter should be nested under a top level parameter and be hidden
- give ability to disable readiness and liveness probe
@TheRealHaoLiu so every probe should have to parameter in the crd
- enable/disable - enable by default
- parameters for the probe - with default as we already set
@erz4 Re: nesting the variables, currently it shows like this:
task_liveness_failure_threshold: 3
task_liveness_initial_delay: 3
task_liveness_period: 3
task_liveness_success_threshold: 1
task_liveness_timeout: 10
task_privileged: false
task_readiness_failure_threshold: 3
task_readiness_initial_delay: 3
task_readiness_period: 3
task_readiness_success_threshold: 1
task_readiness_timeout: 10
web_liveness_failure_threshold: 3
web_liveness_initial_delay: 3
web_liveness_period: 3
web_liveness_success_threshold: 1
web_liveness_timeout: 10
web_readiness_failure_threshold: 3
web_readiness_initial_delay: 3
web_readiness_period: 3
web_readiness_success_threshold: 1
web_readiness_timeout: 5
We are hoping to nest these variables to declutter the AWX CR a bit.
task.liveness.failure_threshold: 3
task.liveness.initial_delay: 3
task.liveness.period: 3
task.liveness.success_threshold: 1
task.liveness.timeout: 10
task.readiness.failure_threshold: 3
task.readiness.initial_delay: 3
task.readiness.period: 3
task.readiness.success_threshold: 1
task.readiness.timeout: 10
web.liveness.failure_threshold: 3
web.liveness.initial_delay: 3
web.liveness.period: 3
web.liveness.success_threshold: 1
web.liveness.timeout: 10
web.readiness.failure_threshold: 3
web.readiness.initial_delay: 3
web.readiness.period: 3
web.readiness.success_threshold: 1
web.readiness.timeout: 5
When testing this out, it fails on the "Apply deployment resources" task, presumably because the probe timed out. The timeout may be too low. The timeout is 10 seconds and the database migrations take much longer than that to run. Probably 60-70 seconds if I had to guess.
Hi @erz4 we are prioritizing to get this in next.
due to the recent change to the deployment of awx (web-task-split) the PR need some heavy rebasing and update
would u be able to get to this?
There is an open PR actively being worked on here to implement this:
- https://github.com/ansible/awx-operator/pull/1674
This feature has been merged as part of https://github.com/ansible/awx-operator/pull/1674